155 lines
		
	
	
		
			6.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			155 lines
		
	
	
		
			6.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| ========================================
 | |
| The PDB Info Stream (aka the PDB Stream)
 | |
| ========================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _pdb_stream_header:
 | |
| 
 | |
| Stream Header
 | |
| =============
 | |
| At offset 0 of the PDB Stream is a header with the following layout:
 | |
| 
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct PdbStreamHeader {
 | |
|     ulittle32_t Version;
 | |
|     ulittle32_t Signature;
 | |
|     ulittle32_t Age;
 | |
|     Guid UniqueId;
 | |
|   };
 | |
| 
 | |
| - **Version** - A Value from the following enum:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class PdbStreamVersion : uint32_t {
 | |
|     VC2 = 19941610,
 | |
|     VC4 = 19950623,
 | |
|     VC41 = 19950814,
 | |
|     VC50 = 19960307,
 | |
|     VC98 = 19970604,
 | |
|     VC70Dep = 19990604,
 | |
|     VC70 = 20000404,
 | |
|     VC80 = 20030901,
 | |
|     VC110 = 20091201,
 | |
|     VC140 = 20140508,
 | |
|   };
 | |
| 
 | |
| While the meaning of this field appears to be obvious, in practice we have
 | |
| never observed a value other than ``VC70``, even with modern versions of
 | |
| the toolchain, and it is unclear why the other values exist.  It is assumed
 | |
| that certain aspects of the PDB stream's layout, and perhaps even that of
 | |
| the other streams, will change if the value is something other than ``VC70``.
 | |
| 
 | |
| - **Signature** - A 32-bit time-stamp generated with a call to ``time()`` at
 | |
|   the time the PDB file is written.  Note that due to the inherent uniqueness
 | |
|   problems of using a timestamp with 1-second granularity, this field does not
 | |
|   really serve its intended purpose, and as such is typically ignored in favor
 | |
|   of the ``Guid`` field, described below.
 | |
|   
 | |
| - **Age** - The number of times the PDB file has been written.  This can be used
 | |
|   along with ``Guid`` to match the PDB to its corresponding executable.
 | |
|   
 | |
| - **Guid** - A 128-bit identifier guaranteed to be unique across space and time.
 | |
|   In general, this can be thought of as the result of calling the Win32 API 
 | |
|   `UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__,
 | |
|   although LLVM cannot rely on that, as it must work on non-Windows platforms.
 | |
|   
 | |
| .. _pdb_named_stream_map:
 | |
| 
 | |
| Named Stream Map
 | |
| ================
 | |
| 
 | |
| Following the header is a serialized hash table whose key type is a string, and
 | |
| whose value type is an integer.  The existence of a mapping ``X -> Y`` means
 | |
| that the stream with the name ``X`` has stream index ``Y`` in the underlying MSF
 | |
| file.  Note that not all streams are named (for example, the 
 | |
| :doc:`TPI Stream <TpiStream>` has a fixed index and as such there is no need to
 | |
| look up its index by name).  In practice, there are usually only a small number
 | |
| of named streams and these are enumerated in the table of streams in :doc:`index`.
 | |
| A corollary of this is if a stream does have a name (and as such is in the named
 | |
| stream map) then consulting the Named Stream Map is likely to be the only way to
 | |
| discover the stream's MSF stream index.  Several important streams (such as the
 | |
| global string table, which is called ``/names``) can only be located this way, and
 | |
| so it is important to both produce and consume this correctly as tools will not
 | |
| function correctly without it.
 | |
| 
 | |
| .. important::
 | |
|    Some streams are located by fixed indices (e.g TPI Stream has index 2), but
 | |
|    other streams are located by fixed names (e.g. the string table is called
 | |
|    ``/names``) and can only be located by consulting the Named Stream Map.
 | |
| 
 | |
| The on-disk layout of the Named Stream Map consists of 2 components.  The first is
 | |
| a buffer of string data prefixed by a 32-bit length.  The second is a serialized
 | |
| hash table whose key and value types are both ``uint32_t``.  The key is the offset
 | |
| of a null-terminated string in the string data buffer specifying the name of the
 | |
| stream, and the value is the MSF stream index of the stream with said name. 
 | |
| Note that although the key is an integer, the hash function used to find the right
 | |
| bucket hashes the string at the corresponding offset in the string data buffer.
 | |
| 
 | |
| The on-disk layout of the serialized hash table is described at :doc:`HashTable`.
 | |
| 
 | |
| Note that the entire Named Stream Map is not length-prefixed, so the only way to
 | |
| get to the data following it is to de-serialize it in its entirety.
 | |
| 
 | |
|   
 | |
| .. _pdb_stream_features:
 | |
| 
 | |
| PDB Feature Codes
 | |
| =================
 | |
| Following the Named Stream Map, and consuming all remaining bytes of the PDB
 | |
| Stream is a list of values from the following enumeration:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   enum class PdbRaw_FeatureSig : uint32_t {
 | |
|     VC110 = 20091201,
 | |
|     VC140 = 20140508,
 | |
|     NoTypeMerge = 0x4D544F4E,
 | |
|     MinimalDebugInfo = 0x494E494D,
 | |
|   };
 | |
|   
 | |
| The meaning of these values is summarized by the following table:
 | |
| 
 | |
| +------------------+-------------------------------------------------+
 | |
| | Flag             | Meaning                                         |
 | |
| +==================+=================================================+
 | |
| | VC110            | - No other features flags are present           |
 | |
| |                  | - PDB contains an :doc:`IPI Stream <TpiStream>` |
 | |
| +------------------+-------------------------------------------------+
 | |
| | VC140            | - Other feature flags may be present            |
 | |
| |                  | - PDB contains an :doc:`IPI Stream <TpiStream>` |
 | |
| +------------------+-------------------------------------------------+
 | |
| | NoTypeMerge      | - Presumably duplicate types can appear in the  |
 | |
| |                  |   TPI Stream, although it's unclear why this    |
 | |
| |                  |   might happen.                                 |
 | |
| +------------------+-------------------------------------------------+
 | |
| | MinimalDebugInfo | - Program was linked with /DEBUG:FASTLINK       |
 | |
| |                  | - There is no TPI / IPI stream, all type info   |
 | |
| |                  |   is contained in the original object files.    |
 | |
| +------------------+-------------------------------------------------+
 | |
|   
 | |
| Matching a PDB to its executable
 | |
| ================================
 | |
| The linker is responsible for writing both the PDB and the final executable, and
 | |
| as a result is the only entity capable of writing the information necessary to
 | |
| match the PDB to the executable.
 | |
| 
 | |
| In order to accomplish this, the linker generates a guid for the PDB (or
 | |
| re-uses the existing guid if it is linking incrementally) and increments the Age
 | |
| field.
 | |
| 
 | |
| The executable is a PE/COFF file, and part of a PE/COFF file is the presence of
 | |
| number of "directories".  For our purposes here, we are interested in the "debug
 | |
| directory".  The exact format of a debug directory is described by the
 | |
| `IMAGE_DEBUG_DIRECTORY structure <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680307(v=vs.85).aspx>`__.
 | |
| For this particular case, the linker emits a debug directory of type
 | |
| ``IMAGE_DEBUG_TYPE_CODEVIEW``.  The format of this record is defined in
 | |
| ``llvm/DebugInfo/CodeView/CVDebugRecord.h``, but it suffices to say here only
 | |
| that it includes the same ``Guid`` and ``Age`` fields.  At runtime, a
 | |
| debugger or tool can scan the COFF executable image for the presence of
 | |
| a debug directory of the correct type and verify that the Guid and Age match.
 |