168 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			168 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
=====================================
 | 
						|
The PDB File Format
 | 
						|
=====================================
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
.. _pdb_intro:
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
PDB (Program Database) is a file format invented by Microsoft and which contains
 | 
						|
debug information that can be consumed by debuggers and other tools.  Since
 | 
						|
officially supported APIs exist on Windows for querying debug information from
 | 
						|
PDBs even without the user understanding the internals of the file format, a
 | 
						|
large ecosystem of tools has been built for Windows to consume this format.  In
 | 
						|
order for Clang to be able to generate programs that can interoperate with these
 | 
						|
tools, it is necessary for us to generate PDB files ourselves.
 | 
						|
 | 
						|
At the same time, LLVM has a long history of being able to cross-compile from
 | 
						|
any platform to any platform, and we wish for the same to be true here.  So it
 | 
						|
is necessary for us to understand the PDB file format at the byte-level so that
 | 
						|
we can generate PDB files entirely on our own.
 | 
						|
 | 
						|
This manual describes what we know about the PDB file format today.  The layout
 | 
						|
of the file, the various streams contained within, the format of individual
 | 
						|
records within, and more.
 | 
						|
 | 
						|
We would like to extend our heartfelt gratitude to Microsoft, without whom we
 | 
						|
would not be where we are today.  Much of the knowledge contained within this
 | 
						|
manual was learned through reading code published by Microsoft on their `GitHub
 | 
						|
repo <https://github.com/Microsoft/microsoft-pdb>`__.
 | 
						|
 | 
						|
.. _pdb_layout:
 | 
						|
 | 
						|
File Layout
 | 
						|
===========
 | 
						|
 | 
						|
.. important::
 | 
						|
   Unless otherwise specified, all numeric values are encoded in little endian.
 | 
						|
   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
 | 
						|
   assume it is little endian!
 | 
						|
 | 
						|
.. toctree::
 | 
						|
   :hidden:
 | 
						|
   
 | 
						|
   MsfFile
 | 
						|
   PdbStream
 | 
						|
   TpiStream
 | 
						|
   DbiStream
 | 
						|
   ModiStream
 | 
						|
   PublicStream
 | 
						|
   GlobalStream
 | 
						|
   HashStream
 | 
						|
   CodeViewSymbols
 | 
						|
   CodeViewTypes
 | 
						|
 | 
						|
.. _msf:
 | 
						|
 | 
						|
The MSF Container
 | 
						|
-----------------
 | 
						|
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
 | 
						|
An MSF file is actually a miniature "file system within a file".  It contains
 | 
						|
multiple streams (aka files) which can represent arbitrary data, and these
 | 
						|
streams are divided into blocks which may not necessarily be contiguously
 | 
						|
laid out within the file (aka fragmented).  Additionally, the MSF contains a
 | 
						|
stream directory (aka MFT) which describes how the streams (files) are laid
 | 
						|
out within the MSF.
 | 
						|
 | 
						|
For more information about the MSF container format, stream directory, and
 | 
						|
block layout, see :doc:`MsfFile`.
 | 
						|
 | 
						|
.. _streams:
 | 
						|
 | 
						|
Streams
 | 
						|
-------
 | 
						|
The PDB format contains a number of streams which describe various information
 | 
						|
such as the types, symbols, source files, and compilands (e.g. object files)
 | 
						|
of a program, as well as some additional streams containing hash tables that are
 | 
						|
used by debuggers and other tools to provide fast lookup of records and types
 | 
						|
by name, and various other information about how the program was compiled such
 | 
						|
as the specific toolchain used, and more.  A summary of streams contained in a
 | 
						|
PDB file is as follows:
 | 
						|
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| Name               | Stream Index                 | Contents                                  |
 | 
						|
+====================+==============================+===========================================+
 | 
						|
| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
 | 
						|
|                    |                              | - Fields to match EXE to this PDB         |
 | 
						|
|                    |                              | - Map of named streams to stream indices  |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
 | 
						|
|                    |                              | - Index of TPI Hash Stream                |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
 | 
						|
|                    |                              | - Indices of individual module streams    |
 | 
						|
|                    |                              | - Indices of public / global streams      |
 | 
						|
|                    |                              | - Section Contribution Information        |
 | 
						|
|                    |                              | - Source File Information                 |
 | 
						|
|                    |                              | - FPO / PGO Data                          |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
 | 
						|
|                    |                              | - Index of IPI Hash Stream                |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
 | 
						|
|                    |   Named Stream map           |                                           |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |
 | 
						|
|                    |   Named Stream map           |                                           |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
 | 
						|
|                    |   Named Stream map           |   string de-duplication                   |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
 | 
						|
|                    | - One for each compiland     | - Line Number Information                 |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
 | 
						|
|                    |                              | - Index of Public Hash Stream             |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |
 | 
						|
|                    |                              | - Index of Global Hash Stream             |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
 | 
						|
|                    |                              |   by name                                 |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
 | 
						|
|                    |                              |   by name                                 |
 | 
						|
+--------------------+------------------------------+-------------------------------------------+
 | 
						|
 | 
						|
More information about the structure of each of these can be found on the
 | 
						|
following pages:
 | 
						|
   
 | 
						|
:doc:`PdbStream`
 | 
						|
   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
 | 
						|
 | 
						|
:doc:`TpiStream`
 | 
						|
   Information about the TPI stream and the CodeView records contained within.
 | 
						|
 | 
						|
:doc:`DbiStream`
 | 
						|
   Information about the DBI stream and relevant substreams including the Module Substreams,
 | 
						|
   source file information, and CodeView symbol records contained within.
 | 
						|
 | 
						|
:doc:`ModiStream`
 | 
						|
   Information about the Module Information Stream, of which there is one for each compilation
 | 
						|
   unit and the format of symbols contained within.
 | 
						|
 | 
						|
:doc:`PublicStream`
 | 
						|
   Information about the Public Symbol Stream.
 | 
						|
 | 
						|
:doc:`GlobalStream`
 | 
						|
   Information about the Global Symbol Stream.
 | 
						|
 | 
						|
:doc:`HashStream`
 | 
						|
   Information about the Hash Table stream, and how it can be used to quickly look up records
 | 
						|
   by name.
 | 
						|
 | 
						|
CodeView
 | 
						|
========
 | 
						|
CodeView is another format which comes into the picture.  While MSF defines
 | 
						|
the structure of the overall file, and PDB defines the set of streams that
 | 
						|
appear within the MSF file and the format of those streams, CodeView defines
 | 
						|
the format of **symbol and type records** that appear within specific streams.
 | 
						|
Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
 | 
						|
more information about the CodeView format.
 |