313 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			313 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =====================================
 | |
| The PDB TPI and IPI Streams
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _tpi_intro:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
 | |
| all types used in the program.  It is organized as a :ref:`header <tpi_header>`
 | |
| followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are
 | |
| referenced from various streams and records throughout the PDB by their
 | |
| :ref:`type index <type_indices>`.  In general, the sequence of type records
 | |
| following the :ref:`header <tpi_header>` forms a topologically sorted DAG
 | |
| (directed acyclic graph), which means that a type record B can only refer to
 | |
| the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where
 | |
| this property will not hold (particularly when dealing with object files
 | |
| compiled with MASM), an implementation should try very hard to make this
 | |
| property hold, as it means the entire type graph can be constructed in a single
 | |
| pass.
 | |
| 
 | |
| .. important::
 | |
|    Type records form a topologically sorted DAG (directed acyclic graph).
 | |
|    
 | |
| .. _tpi_ipi:
 | |
| 
 | |
| TPI vs IPI Stream
 | |
| =================
 | |
| 
 | |
| Recent versions of the PDB format (aka all versions covered by this document)
 | |
| have 2 streams with identical layout, henceforth referred to as the TPI stream
 | |
| and IPI stream.  Subsequent contents of this document describing the on-disk
 | |
| format apply equally whether it is for the TPI Stream or the IPI Stream.  The
 | |
| only difference between the two is in *which* CodeView records are allowed to
 | |
| appear in each one, summarized by the following table:
 | |
| 
 | |
| +----------------------+---------------------+
 | |
| |    TPI Stream        |    IPI Stream       |
 | |
| +======================+=====================+
 | |
| |  LF_POINTER          | LF_FUNC_ID          |
 | |
| +----------------------+---------------------+
 | |
| |  LF_MODIFIER         | LF_MFUNC_ID         |
 | |
| +----------------------+---------------------+
 | |
| |  LF_PROCEDURE        | LF_BUILDINFO        |
 | |
| +----------------------+---------------------+
 | |
| |  LF_MFUNCTION        | LF_SUBSTR_LIST      |
 | |
| +----------------------+---------------------+
 | |
| |  LF_LABEL            | LF_STRING_ID        |
 | |
| +----------------------+---------------------+
 | |
| |  LF_ARGLIST          | LF_UDT_SRC_LINE     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |
 | |
| +----------------------+---------------------+
 | |
| |  LF_ARRAY            |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_CLASS            |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_STRUCTURE        |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_INTERFACE        |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_UNION            |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_ENUM             |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_TYPESERVER2      |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_VFTABLE          |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_VTSHAPE          |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_BITFIELD         |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_METHODLIST       |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_PRECOMP          |                     |
 | |
| +----------------------+---------------------+
 | |
| |  LF_ENDPRECOMP       |                     |
 | |
| +----------------------+---------------------+
 | |
| 
 | |
| The usage of these records is described in more detail in
 | |
| :doc:`CodeView Type Records <CodeViewTypes>`.
 | |
| 
 | |
| .. _type_indices:
 | |
| 
 | |
| Type Indices
 | |
| ============
 | |
| 
 | |
| A type index is a 32-bit integer that uniquely identifies a type inside of an
 | |
| object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The
 | |
| value of the type index for the first type record from the TPI stream is given
 | |
| by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
 | |
| although in practice this value is always equal to 0x1000 (4096).
 | |
| 
 | |
| Any type index with a high bit set is considered to come from the IPI stream,
 | |
| although this appears to be more of a hack, and LLVM does not generate type
 | |
| indices of this nature.  They can, however, be observed in Microsoft PDBs
 | |
| occasionally, so one should be prepared to handle them.  Note that having the
 | |
| high bit set is not a necessary condition to determine whether a type index
 | |
| comes from the IPI stream, it is only sufficient.
 | |
| 
 | |
| Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
 | |
| to come from the appropriate stream, and any type index less than this is a
 | |
| bitmask which can be decomposed as follows:
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|   .---------------------------.------.----------.
 | |
|   |           Unused          | Mode |   Kind   |
 | |
|   '---------------------------'------'----------'
 | |
|   |+32                        |+12   |+8        |+0
 | |
| 
 | |
| 
 | |
| - **Kind** - A value from the following enum:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class SimpleTypeKind : uint32_t {
 | |
|     None = 0x0000,          // uncharacterized type (no type)
 | |
|     Void = 0x0003,          // void
 | |
|     NotTranslated = 0x0007, // type not translated by cvpack
 | |
|     HResult = 0x0008,       // OLE/COM HRESULT
 | |
| 
 | |
|     SignedCharacter = 0x0010,   // 8 bit signed
 | |
|     UnsignedCharacter = 0x0020, // 8 bit unsigned
 | |
|     NarrowCharacter = 0x0070,   // really a char
 | |
|     WideCharacter = 0x0071,     // wide char
 | |
|     Character16 = 0x007a,       // char16_t
 | |
|     Character32 = 0x007b,       // char32_t
 | |
| 
 | |
|     SByte = 0x0068,       // 8 bit signed int
 | |
|     Byte = 0x0069,        // 8 bit unsigned int
 | |
|     Int16Short = 0x0011,  // 16 bit signed
 | |
|     UInt16Short = 0x0021, // 16 bit unsigned
 | |
|     Int16 = 0x0072,       // 16 bit signed int
 | |
|     UInt16 = 0x0073,      // 16 bit unsigned int
 | |
|     Int32Long = 0x0012,   // 32 bit signed
 | |
|     UInt32Long = 0x0022,  // 32 bit unsigned
 | |
|     Int32 = 0x0074,       // 32 bit signed int
 | |
|     UInt32 = 0x0075,      // 32 bit unsigned int
 | |
|     Int64Quad = 0x0013,   // 64 bit signed
 | |
|     UInt64Quad = 0x0023,  // 64 bit unsigned
 | |
|     Int64 = 0x0076,       // 64 bit signed int
 | |
|     UInt64 = 0x0077,      // 64 bit unsigned int
 | |
|     Int128Oct = 0x0014,   // 128 bit signed int
 | |
|     UInt128Oct = 0x0024,  // 128 bit unsigned int
 | |
|     Int128 = 0x0078,      // 128 bit signed int
 | |
|     UInt128 = 0x0079,     // 128 bit unsigned int
 | |
| 
 | |
|     Float16 = 0x0046,                 // 16 bit real
 | |
|     Float32 = 0x0040,                 // 32 bit real
 | |
|     Float32PartialPrecision = 0x0045, // 32 bit PP real
 | |
|     Float48 = 0x0044,                 // 48 bit real
 | |
|     Float64 = 0x0041,                 // 64 bit real
 | |
|     Float80 = 0x0042,                 // 80 bit real
 | |
|     Float128 = 0x0043,                // 128 bit real
 | |
| 
 | |
|     Complex16 = 0x0056,                 // 16 bit complex
 | |
|     Complex32 = 0x0050,                 // 32 bit complex
 | |
|     Complex32PartialPrecision = 0x0055, // 32 bit PP complex
 | |
|     Complex48 = 0x0054,                 // 48 bit complex
 | |
|     Complex64 = 0x0051,                 // 64 bit complex
 | |
|     Complex80 = 0x0052,                 // 80 bit complex
 | |
|     Complex128 = 0x0053,                // 128 bit complex
 | |
| 
 | |
|     Boolean8 = 0x0030,   // 8 bit boolean
 | |
|     Boolean16 = 0x0031,  // 16 bit boolean
 | |
|     Boolean32 = 0x0032,  // 32 bit boolean
 | |
|     Boolean64 = 0x0033,  // 64 bit boolean
 | |
|     Boolean128 = 0x0034, // 128 bit boolean
 | |
|   };
 | |
| 
 | |
| - **Mode** - A value from the following enum:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class SimpleTypeMode : uint32_t {
 | |
|     Direct = 0,        // Not a pointer
 | |
|     NearPointer = 1,   // Near pointer
 | |
|     FarPointer = 2,    // Far pointer
 | |
|     HugePointer = 3,   // Huge pointer
 | |
|     NearPointer32 = 4, // 32 bit near pointer
 | |
|     FarPointer32 = 5,  // 32 bit far pointer
 | |
|     NearPointer64 = 6, // 64 bit near pointer
 | |
|     NearPointer128 = 7 // 128 bit near pointer
 | |
|   };
 | |
|   
 | |
| Note that for pointers, the bitness is represented in the mode.  So a ``void*``
 | |
| would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
 | |
| but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
 | |
| 
 | |
| By convention, the type index for ``std::nullptr_t`` is constructed the same way
 | |
| as the type index for ``void*``, but using the bitless enumeration value
 | |
| ``NearPointer``.
 | |
| 
 | |
| 
 | |
| 
 | |
| .. _tpi_header:
 | |
| 
 | |
| Stream Header
 | |
| =============
 | |
| At offset 0 of the TPI Stream is a header with the following layout:
 | |
| 
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct TpiStreamHeader {
 | |
|     uint32_t Version;
 | |
|     uint32_t HeaderSize;
 | |
|     uint32_t TypeIndexBegin;
 | |
|     uint32_t TypeIndexEnd;
 | |
|     uint32_t TypeRecordBytes;
 | |
| 
 | |
|     uint16_t HashStreamIndex;
 | |
|     uint16_t HashAuxStreamIndex;
 | |
|     uint32_t HashKeySize;
 | |
|     uint32_t NumHashBuckets;
 | |
| 
 | |
|     int32_t HashValueBufferOffset;
 | |
|     uint32_t HashValueBufferLength;
 | |
|     
 | |
|     int32_t IndexOffsetBufferOffset;
 | |
|     uint32_t IndexOffsetBufferLength;
 | |
| 
 | |
|     int32_t HashAdjBufferOffset;
 | |
|     uint32_t HashAdjBufferLength;
 | |
|   };
 | |
|   
 | |
| - **Version** - A value from the following enum.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class TpiStreamVersion : uint32_t {
 | |
|     V40 = 19950410,
 | |
|     V41 = 19951122,
 | |
|     V50 = 19961031,
 | |
|     V70 = 19990903,
 | |
|     V80 = 20040203,
 | |
|   };
 | |
| 
 | |
| Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
 | |
| ``V80``, and no other values have been observed.  It is assumed that should
 | |
| another value be observed, the layout described by this document may not be
 | |
| accurate.
 | |
| 
 | |
| - **HeaderSize** - ``sizeof(TpiStreamHeader)``
 | |
|   
 | |
| - **TypeIndexBegin** - The numeric value of the type index representing the
 | |
|   first type record in the TPI stream.  This is usually the value 0x1000 as type
 | |
|   indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
 | |
|   a discussion of reserved type indices).
 | |
|   
 | |
| - **TypeIndexEnd** - One greater than the numeric value of the type index
 | |
|   representing the last type record in the TPI stream.  The total number of type
 | |
|   records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
 | |
|   
 | |
| - **TypeRecordBytes** - The number of bytes of type record data following the header.
 | |
|   
 | |
| - **HashStreamIndex** - The index of a stream which contains a list of hashes for
 | |
|   every type record.  This value may be -1, indicating that hash information is not
 | |
|   present.  In practice a valid stream index is always observed, so any producer
 | |
|   implementation should be prepared to emit this stream to ensure compatibility with
 | |
|   tools which may expect it to be present.
 | |
|   
 | |
| - **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
 | |
|   hash table, although this has not been observed in practice and it's unclear what it
 | |
|   might be used for.
 | |
|   
 | |
| - **HashKeySize** - The size of a hash value (usually 4 bytes).
 | |
| 
 | |
| - **NumHashBuckets** - The number of buckets used to generate the hash values in the
 | |
|   aforementioned hash streams.
 | |
| 
 | |
| - **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
 | |
|   the TPI Hash Stream of the list of hash values.  It should be assumed that there
 | |
|   are either 0 hash values, or a number equal to the number of type records in the
 | |
|   TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if ``HashBufferLength`` is
 | |
|   not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
 | |
|   PDB malformed.
 | |
| 
 | |
| - **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
 | |
|   within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list of
 | |
|   pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
 | |
|   and the second value is the offset in the type record data of the type with this
 | |
|   index.  This can be used to do a binary search followed by a linear search to
 | |
|   get O(log n) lookup by type index.
 | |
| 
 | |
| - **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
 | |
|   the TPI hash stream of a serialized hash table whose keys are the hash values
 | |
|   in the hash value buffer and whose values are type indices.  This appears to
 | |
|   be useful in incremental linking scenarios, so that if a type is modified an
 | |
|   entry can be created mapping the old hash value to the new type index so that
 | |
|   a PDB file consumer can always have the most up to date version of the type
 | |
|   without forcing the incremental linker to garbage collect and update
 | |
|   references that point to the old version to now point to the new version.
 | |
|   The layout of this hash table is described in :doc:`HashTable`.
 | |
| 
 | |
| .. _tpi_records:
 | |
| 
 | |
| CodeView Type Record List
 | |
| =========================
 | |
| Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
 | |
| variable length array of :doc:`CodeView type records <CodeViewTypes>`.  The number
 | |
| of such records (e.g. the length of the array) can be determined by computing the
 | |
| value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
 | |
| 
 | |
| O(log(n)) access is provided by way of the Type Index Offsets array (if
 | |
| present) described previously.
 |