104 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			104 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| The PDB Serialized Hash Table Format
 | |
| ====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _hash_intro:
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| One of the design goals of the PDB format is to provide accelerated access to
 | |
| debug information, and for this reason there are several occasions where hash
 | |
| tables are serialized and embedded directly to the file, rather than requiring
 | |
| a consumer to read a list of values and reconstruct the hash table on the fly.
 | |
| 
 | |
| The serialization format supports hash tables of arbitrarily large size and
 | |
| capacity, as well as value types and hash functions.  The only supported key
 | |
| value type is a uint32.  The only requirement is that the producer and consumer
 | |
| agree on the hash function.  As such, the hash function can is not discussed
 | |
| further in this document, it is assumed that for a particular instance of a PDB
 | |
| file hash table, the appropriate hash function is being used.
 | |
| 
 | |
| On-Disk Format
 | |
| ==============
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|   .--------------------.-- +0
 | |
|   |        Size        |
 | |
|   .--------------------.-- +4
 | |
|   |      Capacity      |
 | |
|   .--------------------.-- +8
 | |
|   | Present Bit Vector |
 | |
|   .--------------------.-- +N
 | |
|   | Deleted Bit Vector |
 | |
|   .--------------------.-- +M                  ─╮
 | |
|   |        Key         |                        │
 | |
|   .--------------------.-- +M+4                 │
 | |
|   |       Value        |                        │
 | |
|   .--------------------.-- +M+4+sizeof(Value)   │
 | |
|            ...                                  ├─ |Capacity| Bucket entries
 | |
|   .--------------------.                        │
 | |
|   |        Key         |                        │
 | |
|   .--------------------.                        │
 | |
|   |       Value        |                        │
 | |
|   .--------------------.                       ─╯
 | |
| 
 | |
| - **Size** - The number of values contained in the hash table.
 | |
|   
 | |
| - **Capacity** - The number of buckets in the hash table.  Producers should
 | |
|   maintain a load factor of no greater than ``2/3*Capacity+1``.
 | |
|   
 | |
| - **Present Bit Vector** - A serialized bit vector which contains information
 | |
|   about which buckets have valid values.  If the bucket has a value, the
 | |
|   corresponding bit will be set, and if the bucket doesn't have a value (either
 | |
|   because the bucket is empty or because the value is a tombstone value) the bit
 | |
|   will be unset.
 | |
|   
 | |
| - **Deleted Bit Vector** - A serialized bit vector which contains information
 | |
|   about which buckets have tombstone values.  If the entry in this bucket is
 | |
|   deleted, the bit will be set, otherwise it will be unset.
 | |
| 
 | |
| - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
 | |
|   entry is the key (always a uint32), and the second entry is the value.  The
 | |
|   state of each bucket (valid, empty, deleted) can be determined by examining
 | |
|   the present and deleted bit vectors.
 | |
| 
 | |
| 
 | |
| .. _hash_bit_vectors:
 | |
| 
 | |
| Present and Deleted Bit Vectors
 | |
| ===============================
 | |
| 
 | |
| The bit vectors indicating the status of each bucket are serialized as follows:
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|   .--------------------.-- +0
 | |
|   |     Word Count     |
 | |
|   .--------------------.-- +4
 | |
|   |        Word_0      |        ─╮
 | |
|   .--------------------.-- +8    │
 | |
|   |        Word_1      |         │
 | |
|   .--------------------.-- +12   ├─ |Word Count| values
 | |
|            ...                   │
 | |
|   .--------------------.         │
 | |
|   |       Word_N       |         │
 | |
|   .--------------------.        ─╯
 | |
| 
 | |
| The words, when viewed as a contiguous block of bytes, represent a bit vector with
 | |
| the following layout:
 | |
| 
 | |
| .. code-block:: none
 | |
| 
 | |
|     .------------.         .------------.------------.
 | |
|     |   Word_N   |   ...   |   Word_1   |   Word_0   |
 | |
|     .------------.         .------------.------------.
 | |
|     |            |         |            |            |
 | |
|   +N*32      +(N-1)*32    +64          +32          +0
 | |
| 
 | |
| where the k'th bit of this bit vector represents the status of the k'th bucket
 | |
| in the hash table.
 |