466 lines
		
	
	
		
			19 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			466 lines
		
	
	
		
			19 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =====================================
 | |
| The PDB DBI (Debug Info) Stream
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _dbi_intro:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| The PDB DBI Stream (Index 3) is one of the largest and most important streams
 | |
| in a PDB file.  It contains information about how the program was compiled,
 | |
| (e.g. compilation flags, etc), the compilands (e.g. object files) that
 | |
| were used to link together the program, the source files which were used
 | |
| to build the program, as well as references to other streams that contain more
 | |
| detailed information about each compiland, such as the CodeView symbol records
 | |
| contained within each compiland and the source and line information for
 | |
| functions and other symbols within each compiland.
 | |
| 
 | |
| 
 | |
| .. _dbi_header:
 | |
| 
 | |
| Stream Header
 | |
| =============
 | |
| At offset 0 of the DBI Stream is a header with the following layout:
 | |
| 
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct DbiStreamHeader {
 | |
|     int32_t VersionSignature;
 | |
|     uint32_t VersionHeader;
 | |
|     uint32_t Age;
 | |
|     uint16_t GlobalStreamIndex;
 | |
|     uint16_t BuildNumber;
 | |
|     uint16_t PublicStreamIndex;
 | |
|     uint16_t PdbDllVersion;
 | |
|     uint16_t SymRecordStream;
 | |
|     uint16_t PdbDllRbld;
 | |
|     int32_t ModInfoSize;
 | |
|     int32_t SectionContributionSize;
 | |
|     int32_t SectionMapSize;
 | |
|     int32_t SourceInfoSize;
 | |
|     int32_t TypeServerMapSize;
 | |
|     uint32_t MFCTypeServerIndex;
 | |
|     int32_t OptionalDbgHeaderSize;
 | |
|     int32_t ECSubstreamSize;
 | |
|     uint16_t Flags;
 | |
|     uint16_t Machine;
 | |
|     uint32_t Padding;
 | |
|   };
 | |
|   
 | |
| - **VersionSignature** - Unknown meaning.  Appears to always be ``-1``.
 | |
| 
 | |
| - **VersionHeader** - A value from the following enum.
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class DbiStreamVersion : uint32_t {
 | |
|     VC41 = 930803,
 | |
|     V50 = 19960307,
 | |
|     V60 = 19970606,
 | |
|     V70 = 19990903,
 | |
|     V110 = 20091201
 | |
|   };
 | |
| 
 | |
| Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
 | |
| ``V70``, and it is not clear what the other values are for.
 | |
| 
 | |
| - **Age** - The number of times the PDB has been written.  Equal to the same
 | |
|   field from the :ref:`PDB Stream header <pdb_stream_header>`.
 | |
|   
 | |
| - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
 | |
|   which contains CodeView symbol records for all global symbols.  Actual records
 | |
|   are stored in the symbol record stream, and are referenced from this stream.
 | |
|   
 | |
| - **BuildNumber** - A bitfield containing values representing the major and minor
 | |
|   version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
 | |
|   program, with the following layout:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   uint16_t MinorVersion : 8;
 | |
|   uint16_t MajorVersion : 7;
 | |
|   uint16_t NewVersionFormat : 1;
 | |
| 
 | |
| For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
 | |
| If it is ``false``, the layout above does not apply and the reader should consult
 | |
| the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
 | |
| further guidance.
 | |
|   
 | |
| - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
 | |
|   which contains CodeView symbol records for all public symbols.  Actual records
 | |
|   are stored in the symbol record stream, and are referenced from this stream.
 | |
|   
 | |
| - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
 | |
|   PDB.  Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
 | |
|   
 | |
| - **SymRecordStream** - The stream containing all CodeView symbol records used
 | |
|   by the program.  This is used for deduplication, so that many different
 | |
|   compilands can refer to the same symbols without having to include the full record
 | |
|   content inside of each module stream.
 | |
|   
 | |
| - **PdbDllRbld** - Unknown
 | |
| 
 | |
| - **MFCTypeServerIndex** - The index of the MFC type server in the
 | |
|   :ref:`dbi_type_server_map_substream`.
 | |
| 
 | |
| - **Flags** - A bitfield with the following layout, containing various
 | |
|   information about how the program was built:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   uint16_t WasIncrementallyLinked : 1;
 | |
|   uint16_t ArePrivateSymbolsStripped : 1;
 | |
|   uint16_t HasConflictingTypes : 1;
 | |
|   uint16_t Reserved : 13;
 | |
| 
 | |
| The only one of these that is not self-explanatory is ``HasConflictingTypes``.
 | |
| Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
 | |
| If it is passed to ``link.exe``, this field will be set.  Otherwise it will
 | |
| not be set.  It is unclear what this flag does, although it seems to have
 | |
| subtle implications on the algorithm used to look up type records.
 | |
| 
 | |
| - **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
 | |
|   enumeration.  Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
 | |
| 
 | |
| Immediately after the fixed-size DBI Stream header are ``7`` variable-length
 | |
| `substreams`.  The following ``7`` fields of the DBI Stream header specify the
 | |
| number of bytes of the corresponding substream.  Each substream's contents will
 | |
| be described in detail :ref:`below <dbi_substreams>`.  The length of the entire
 | |
| DBI Stream should equal ``64`` (the length of the header above) plus the value
 | |
| of each of the following ``7`` fields.
 | |
| 
 | |
| - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
 | |
|   
 | |
| - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
 | |
| 
 | |
| - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
 | |
| 
 | |
| - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
 | |
| 
 | |
| - **TypeServerMapSize** - The length of the :ref:`dbi_type_server_map_substream`.
 | |
| 
 | |
| - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
 | |
| 
 | |
| - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
 | |
| 
 | |
| .. _dbi_substreams:
 | |
| 
 | |
| Substreams
 | |
| ==========
 | |
| 
 | |
| .. _dbi_mod_info_substream:
 | |
| 
 | |
| Module Info Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`.  The
 | |
| module info substream is an array of variable-length records, each one
 | |
| describing a single module (e.g. object file) linked into the program.  Each
 | |
| record in the array has the format:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct ModInfo {
 | |
|     uint32_t Unused1;
 | |
|     struct SectionContribEntry {
 | |
|       uint16_t Section;
 | |
|       char Padding1[2];
 | |
|       int32_t Offset;
 | |
|       int32_t Size;
 | |
|       uint32_t Characteristics;
 | |
|       uint16_t ModuleIndex;
 | |
|       char Padding2[2];
 | |
|       uint32_t DataCrc;
 | |
|       uint32_t RelocCrc;
 | |
|     } SectionContr;
 | |
|     uint16_t Flags;
 | |
|     uint16_t ModuleSymStream;
 | |
|     uint32_t SymByteSize;
 | |
|     uint32_t C11ByteSize;
 | |
|     uint32_t C13ByteSize;
 | |
|     uint16_t SourceFileCount;
 | |
|     char Padding[2];
 | |
|     uint32_t Unused2;
 | |
|     uint32_t SourceFileNameIndex;
 | |
|     uint32_t PdbFilePathNameIndex;
 | |
|     char ModuleName[];
 | |
|     char ObjFileName[];
 | |
|   };
 | |
|   
 | |
| - **SectionContr** - Describes the properties of the section in the final binary
 | |
|   which contain the code and data from this module.
 | |
| 
 | |
|   ``SectionContr.Characteristics`` corresponds to the ``Characteristics`` field
 | |
|   of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
 | |
|   structure.
 | |
|   
 | |
| 
 | |
| - **Flags** - A bitfield with the following format:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   // ``true`` if this ModInfo has been written since reading the PDB.  This is
 | |
|   // likely used to support incremental linking, so that the linker can decide
 | |
|   // if it needs to commit changes to disk.
 | |
|   uint16_t Dirty : 1;
 | |
|   // ``true`` if EC information is present for this module. EC is presumed to
 | |
|   // stand for "Edit & Continue", which LLVM does not support.  So this flag
 | |
|   // will always be be false.
 | |
|   uint16_t EC : 1;
 | |
|   uint16_t Unused : 6;
 | |
|   // Type Server Index for this module.  This is assumed to be related to /Zi,
 | |
|   // but as LLVM treats /Zi as /Z7, this field will always be invalid for LLVM
 | |
|   // generated PDBs.
 | |
|   uint16_t TSM : 8;
 | |
|   
 | |
| 
 | |
| - **ModuleSymStream** - The index of the stream that contains symbol information
 | |
|   for this module.  This includes CodeView symbol information as well as source
 | |
|   and line information.  If this field is -1, then no additional debug info will
 | |
|   be present for this module (for example, this is what happens when you strip
 | |
|   private symbols from a PDB).
 | |
| 
 | |
| - **SymByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent CodeView symbol records.
 | |
| 
 | |
| - **C11ByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent C11-style CodeView line information.
 | |
| 
 | |
| - **C13ByteSize** - The number of bytes of data from the stream identified by
 | |
|   ``ModuleSymStream`` that represent C13-style CodeView line information.  At
 | |
|   most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.  Modern PDBs
 | |
|   always use C13 instead of C11.
 | |
| 
 | |
| - **SourceFileCount** - The number of source files that contributed to this
 | |
|   module during compilation.
 | |
| 
 | |
| - **SourceFileNameIndex** - The offset in the names buffer of the primary
 | |
|   translation unit used to build this module.  All PDB files observed to date
 | |
|   always have this value equal to 0.
 | |
| 
 | |
| - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
 | |
|   containing this module's symbol information.  This has only been observed
 | |
|   to be non-zero for the special ``* Linker *`` module.
 | |
| 
 | |
| - **ModuleName** - The module name.  This is usually either a full path to an
 | |
|   object file (either directly passed to ``link.exe`` or from an archive) or
 | |
|   a string of the form ``Import:<dll name>``.
 | |
| 
 | |
| - **ObjFileName** - The object file name.  In the case of an module that is
 | |
|   linked directly passed to ``link.exe``, this is the same as **ModuleName**.
 | |
|   In the case of a module that comes from an archive, this is usually the full
 | |
|   path to the archive.
 | |
| 
 | |
| .. _dbi_sec_contr_substream:
 | |
| 
 | |
| Section Contribution Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
 | |
| and consumes ``Header->SectionContributionSize`` bytes.  This substream begins
 | |
| with a single ``uint32_t`` which will be one of the following values:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class SectionContrSubstreamVersion : uint32_t {
 | |
|     Ver60 = 0xeffe0000 + 19970605,
 | |
|     V2 = 0xeffe0000 + 20140516
 | |
|   };
 | |
|   
 | |
| ``Ver60`` is the only value which has been observed in a PDB so far.  Following
 | |
| this is an array of fixed-length structures.  If the version is ``Ver60``,
 | |
| it is an array of ``SectionContribEntry`` structures (this is the nested structure
 | |
| from the ``ModInfo`` type.  If the version is ``V2``, it is an array of
 | |
| ``SectionContribEntry2`` structures, defined as follows:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SectionContribEntry2 {
 | |
|     SectionContribEntry SC;
 | |
|     uint32_t ISectCoff;
 | |
|   };
 | |
|   
 | |
| The purpose of the second field is not well understood.  The name implies that
 | |
| is the index of the COFF section, but this also describes the existing field
 | |
| ``SectionContribEntry::Section``.
 | |
|   
 | |
| 
 | |
| .. _dbi_section_map_substream:
 | |
| 
 | |
| Section Map Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
 | |
| and consumes ``Header->SectionMapSize`` bytes.  This substream begins with an ``4``
 | |
| byte header followed by an array of fixed-length records.  The header and records
 | |
| have the following layout:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SectionMapHeader {
 | |
|     uint16_t Count;    // Number of segment descriptors
 | |
|     uint16_t LogCount; // Number of logical segment descriptors
 | |
|   };
 | |
|   
 | |
|   struct SectionMapEntry {
 | |
|     uint16_t Flags;         // See the SectionMapEntryFlags enum below.
 | |
|     uint16_t Ovl;           // Logical overlay number
 | |
|     uint16_t Group;         // Group index into descriptor array.
 | |
|     uint16_t Frame;
 | |
|     uint16_t SectionName;   // Byte index of segment / group name in string table, or 0xFFFF.
 | |
|     uint16_t ClassName;     // Byte index of class in string table, or 0xFFFF.
 | |
|     uint32_t Offset;        // Byte offset of the logical segment within physical segment.  If group is set in flags, this is the offset of the group.
 | |
|     uint32_t SectionLength; // Byte count of the segment or group.
 | |
|   };
 | |
|   
 | |
|   enum class SectionMapEntryFlags : uint16_t {
 | |
|     Read = 1 << 0,              // Segment is readable.
 | |
|     Write = 1 << 1,             // Segment is writable.
 | |
|     Execute = 1 << 2,           // Segment is executable.
 | |
|     AddressIs32Bit = 1 << 3,    // Descriptor describes a 32-bit linear address.
 | |
|     IsSelector = 1 << 8,        // Frame represents a selector.
 | |
|     IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
 | |
|     IsGroup = 1 << 10           // If set, descriptor represents a group.
 | |
|   };
 | |
|   
 | |
| Many of these fields are not well understood, so will not be discussed further.
 | |
| 
 | |
| .. _dbi_file_info_substream:
 | |
| 
 | |
| File Info Substream
 | |
| ^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
 | |
| and consumes ``Header->SourceInfoSize`` bytes.  This substream defines the mapping
 | |
| from module to the source files that contribute to that module.  Since multiple
 | |
| modules can use the same source file (for example, a header file), this substream
 | |
| uses a string table to store each unique file name only once, and then have each
 | |
| module use offsets into the string table rather than embedding the string's value
 | |
| directly.  The format of this substream is as follows:
 | |
|   
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct FileInfoSubstream {
 | |
|     uint16_t NumModules;
 | |
|     uint16_t NumSourceFiles;
 | |
|     
 | |
|     uint16_t ModIndices[NumModules];
 | |
|     uint16_t ModFileCounts[NumModules];
 | |
|     uint32_t FileNameOffsets[NumSourceFiles];
 | |
|     char NamesBuffer[][NumSourceFiles];
 | |
|   };
 | |
| 
 | |
| **NumModules** - The number of modules for which source file information is
 | |
| contained within this substream.  Should match the corresponding value from the
 | |
| ref:`dbi_header`.
 | |
| 
 | |
| **NumSourceFiles**: In theory this is supposed to contain the number of source
 | |
| files for which this substream contains information.  But that would present a
 | |
| problem in that the width of this field being ``16``-bits would prevent one from
 | |
| having more than 64K source files in a program.  In early versions of the file
 | |
| format, this seems to have been the case.  In order to support more than this, this
 | |
| field of the is simply ignored, and computed dynamically by summing up the values of
 | |
| the ``ModFileCounts`` array (discussed below).  In short, this value should be
 | |
| ignored.
 | |
| 
 | |
| **ModIndices** - This array is present, but does not appear to be useful.
 | |
| 
 | |
| **ModFileCountArray** - An array of ``NumModules`` integers, each one containing
 | |
| the number of source files which contribute to the module at the specified index.
 | |
| While each individual module is limited to 64K contributing source files, the
 | |
| union of all modules' source files may be greater than 64K.  The real number of
 | |
| source files is thus computed by summing this array.  Note that summing this array
 | |
| does not give the number of `unique` source files, only the total number of source
 | |
| file contributions to modules.
 | |
| 
 | |
| **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
 | |
| here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
 | |
| each integer is an offset into **NamesBuffer** pointing to a null terminated string.
 | |
| 
 | |
| **NamesBuffer** - An array of null terminated strings containing the actual source
 | |
| file names.
 | |
| 
 | |
| .. _dbi_type_server_map_substream:
 | |
| 
 | |
| Type Server Map Substream
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream`
 | |
| ends, and consumes ``Header->TypeServerMapSize`` bytes.  Neither the purpose
 | |
| nor the layout of this substream is understood, although it is assumed to
 | |
| related somehow to the usage of ``/Zi`` and ``mspdbsrv.exe``.  This substream
 | |
| will not be discussed further.
 | |
| 
 | |
| .. _dbi_ec_substream:
 | |
| 
 | |
| EC Substream
 | |
| ^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the
 | |
| :ref:`dbi_type_server_map_substream` ends, and consumes
 | |
| ``Header->ECSubstreamSize`` bytes.  This is presumed to be related to Edit &
 | |
| Continue support in MSVC.  LLVM does not support Edit & Continue, so this
 | |
| stream will not be discussed further.
 | |
| 
 | |
| .. _dbi_optional_dbg_stream:
 | |
| 
 | |
| Optional Debug Header Stream
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
 | |
| consumes ``Header->OptionalDbgHeaderSize`` bytes.  This field is an array of
 | |
| stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
 | |
| index in the larger MSF file which contains some additional debug information.
 | |
| Each position of this array has a special meaning, allowing one to determine
 | |
| what kind of debug information is at the referenced stream.  ``11`` indices
 | |
| are currently understood, although it's possible there may be more.  The
 | |
| layout of each stream generally corresponds exactly to a particular type
 | |
| of debug data directory from the PE/COFF file.  The format of these fields
 | |
| can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
 | |
| If any of these fields is -1, it means the corresponding type of debug info is
 | |
| not present in the PDB.
 | |
| 
 | |
| **FPO Data** - ``DbgStreamArray[0]``.  The data in the referenced stream is an
 | |
| array of ``FPO_DATA`` structures.  This contains the relocated contents of
 | |
| any ``.debug$F`` section from any of the linker inputs.
 | |
| 
 | |
| **Exception Data** - ``DbgStreamArray[1]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
 | |
| 
 | |
| **Fixup Data** - ``DbgStreamArray[2]``.  The data in the referenced stream is a
 | |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
 | |
| 
 | |
| **Omap To Src Data** - ``DbgStreamArray[3]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``.  This 
 | |
| is used for mapping addresses between instrumented and uninstrumented code.
 | |
| 
 | |
| **Omap From Src Data** - ``DbgStreamArray[4]``.  The data in the referenced stream
 | |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``.  This 
 | |
| is used for mapping addresses between instrumented and uninstrumented code.
 | |
| 
 | |
| **Section Header Data** - ``DbgStreamArray[5]``.  A dump of all section headers from
 | |
| the original executable.
 | |
| 
 | |
| **Token / RID Map** - ``DbgStreamArray[6]``.  The layout of this stream is not
 | |
| understood, but it is assumed to be a mapping from ``CLR Token`` to 
 | |
| ``CLR Record ID``.  Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
 | |
| for more information.
 | |
| 
 | |
| **Xdata** - ``DbgStreamArray[7]``.  A copy of the ``.xdata`` section from the
 | |
| executable.
 | |
| 
 | |
| **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
 | |
| section from the executable, but that would make it identical to
 | |
| ``DbgStreamArray[1]``.  The difference between these two indices is not well
 | |
| understood.
 | |
| 
 | |
| **New FPO Data** - ``DbgStreamArray[9]``.  The data in the referenced stream is a
 | |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``.  Note that this is different
 | |
| from ``DbgStreamArray[0]`` in that ``.debug$F`` sections are only emitted by MASM.
 | |
| Thus, it is possible for both to appear in the same PDB if both MASM object files
 | |
| and cl object files are linked into the same program.
 | |
| 
 | |
| **Original Section Header Data** - ``DbgStreamArray[10]``.  Similar to 
 | |
| ``DbgStreamArray[5]``, but contains the section headers before any binary translation
 | |
| has been performed.  This can be used in conjunction with ``DebugStreamArray[3]``
 | |
| and ``DbgStreamArray[4]`` to map instrumented and uninstrumented addresses.
 |