168 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			168 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =====================================
 | |
| The PDB File Format
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _pdb_intro:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| PDB (Program Database) is a file format invented by Microsoft and which contains
 | |
| debug information that can be consumed by debuggers and other tools.  Since
 | |
| officially supported APIs exist on Windows for querying debug information from
 | |
| PDBs even without the user understanding the internals of the file format, a
 | |
| large ecosystem of tools has been built for Windows to consume this format.  In
 | |
| order for Clang to be able to generate programs that can interoperate with these
 | |
| tools, it is necessary for us to generate PDB files ourselves.
 | |
| 
 | |
| At the same time, LLVM has a long history of being able to cross-compile from
 | |
| any platform to any platform, and we wish for the same to be true here.  So it
 | |
| is necessary for us to understand the PDB file format at the byte-level so that
 | |
| we can generate PDB files entirely on our own.
 | |
| 
 | |
| This manual describes what we know about the PDB file format today.  The layout
 | |
| of the file, the various streams contained within, the format of individual
 | |
| records within, and more.
 | |
| 
 | |
| We would like to extend our heartfelt gratitude to Microsoft, without whom we
 | |
| would not be where we are today.  Much of the knowledge contained within this
 | |
| manual was learned through reading code published by Microsoft on their `GitHub
 | |
| repo <https://github.com/Microsoft/microsoft-pdb>`__.
 | |
| 
 | |
| .. _pdb_layout:
 | |
| 
 | |
| File Layout
 | |
| ===========
 | |
| 
 | |
| .. important::
 | |
|    Unless otherwise specified, all numeric values are encoded in little endian.
 | |
|    If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
 | |
|    assume it is little endian!
 | |
| 
 | |
| .. toctree::
 | |
|    :hidden:
 | |
|    
 | |
|    MsfFile
 | |
|    PdbStream
 | |
|    TpiStream
 | |
|    DbiStream
 | |
|    ModiStream
 | |
|    PublicStream
 | |
|    GlobalStream
 | |
|    HashStream
 | |
|    CodeViewSymbols
 | |
|    CodeViewTypes
 | |
| 
 | |
| .. _msf:
 | |
| 
 | |
| The MSF Container
 | |
| -----------------
 | |
| A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
 | |
| An MSF file is actually a miniature "file system within a file".  It contains
 | |
| multiple streams (aka files) which can represent arbitrary data, and these
 | |
| streams are divided into blocks which may not necessarily be contiguously
 | |
| laid out within the file (aka fragmented).  Additionally, the MSF contains a
 | |
| stream directory (aka MFT) which describes how the streams (files) are laid
 | |
| out within the MSF.
 | |
| 
 | |
| For more information about the MSF container format, stream directory, and
 | |
| block layout, see :doc:`MsfFile`.
 | |
| 
 | |
| .. _streams:
 | |
| 
 | |
| Streams
 | |
| -------
 | |
| The PDB format contains a number of streams which describe various information
 | |
| such as the types, symbols, source files, and compilands (e.g. object files)
 | |
| of a program, as well as some additional streams containing hash tables that are
 | |
| used by debuggers and other tools to provide fast lookup of records and types
 | |
| by name, and various other information about how the program was compiled such
 | |
| as the specific toolchain used, and more.  A summary of streams contained in a
 | |
| PDB file is as follows:
 | |
| 
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | Name               | Stream Index                 | Contents                                  |
 | |
| +====================+==============================+===========================================+
 | |
| | Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
 | |
| |                    |                              | - Fields to match EXE to this PDB         |
 | |
| |                    |                              | - Map of named streams to stream indices  |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
 | |
| |                    |                              | - Index of TPI Hash Stream                |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
 | |
| |                    |                              | - Indices of individual module streams    |
 | |
| |                    |                              | - Indices of public / global streams      |
 | |
| |                    |                              | - Section Contribution Information        |
 | |
| |                    |                              | - Source File Information                 |
 | |
| |                    |                              | - FPO / PGO Data                          |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
 | |
| |                    |                              | - Index of IPI Hash Stream                |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
 | |
| |                    |   Named Stream map           |                                           |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |
 | |
| |                    |   Named Stream map           |                                           |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
 | |
| |                    |   Named Stream map           |   string de-duplication                   |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
 | |
| |                    | - One for each compiland     | - Line Number Information                 |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
 | |
| |                    |                              | - Index of Public Hash Stream             |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |
 | |
| |                    |                              | - Index of Global Hash Stream             |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
 | |
| |                    |                              |   by name                                 |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| | IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
 | |
| |                    |                              |   by name                                 |
 | |
| +--------------------+------------------------------+-------------------------------------------+
 | |
| 
 | |
| More information about the structure of each of these can be found on the
 | |
| following pages:
 | |
|    
 | |
| :doc:`PdbStream`
 | |
|    Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
 | |
| 
 | |
| :doc:`TpiStream`
 | |
|    Information about the TPI stream and the CodeView records contained within.
 | |
| 
 | |
| :doc:`DbiStream`
 | |
|    Information about the DBI stream and relevant substreams including the Module Substreams,
 | |
|    source file information, and CodeView symbol records contained within.
 | |
| 
 | |
| :doc:`ModiStream`
 | |
|    Information about the Module Information Stream, of which there is one for each compilation
 | |
|    unit and the format of symbols contained within.
 | |
| 
 | |
| :doc:`PublicStream`
 | |
|    Information about the Public Symbol Stream.
 | |
| 
 | |
| :doc:`GlobalStream`
 | |
|    Information about the Global Symbol Stream.
 | |
| 
 | |
| :doc:`HashStream`
 | |
|    Information about the Hash Table stream, and how it can be used to quickly look up records
 | |
|    by name.
 | |
| 
 | |
| CodeView
 | |
| ========
 | |
| CodeView is another format which comes into the picture.  While MSF defines
 | |
| the structure of the overall file, and PDB defines the set of streams that
 | |
| appear within the MSF file and the format of those streams, CodeView defines
 | |
| the format of **symbol and type records** that appear within specific streams.
 | |
| Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
 | |
| more information about the CodeView format.
 |