122 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			122 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
| =====================================
 | |
| The MSF File Format
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
| 
 | |
| .. _msf_superblock:
 | |
| 
 | |
| The Superblock
 | |
| ==============
 | |
| At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
 | |
| follows:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct SuperBlock {
 | |
|     char FileMagic[sizeof(Magic)];
 | |
|     ulittle32_t BlockSize;
 | |
|     ulittle32_t FreeBlockMapBlock;
 | |
|     ulittle32_t NumBlocks;
 | |
|     ulittle32_t NumDirectoryBytes;
 | |
|     ulittle32_t Unknown;
 | |
|     ulittle32_t BlockMapAddr;
 | |
|   };
 | |
| 
 | |
| - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
 | |
|   followed by the bytes ``1A 44 53 00 00 00``.
 | |
| - **BlockSize** - The block size of the internal file system.  Valid values are
 | |
|   512, 1024, 2048, and 4096 bytes.  Certain aspects of the MSF file layout vary
 | |
|   depending on the block sizes.  For the purposes of LLVM, we handle only block
 | |
|   sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
 | |
| - **FreeBlockMapBlock** - The index of a block within the file, at which begins
 | |
|   a bitfield representing the set of all blocks within the file which are "free"
 | |
|   (i.e. the data within that block is not used).  This bitfield is spread across
 | |
|   the MSF file at ``BlockSize`` intervals.
 | |
|   **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!  This field
 | |
|   is designed to support incremental and atomic updates of the underlying MSF
 | |
|   file.  While writing to an MSF file, if the value of this field is `1`, you
 | |
|   can write your new modified bitfield to page 2, and vice versa.  Only when
 | |
|   you commit the file to disk do you need to swap the value in the SuperBlock
 | |
|   to point to the new ``FreeBlockMapBlock``.
 | |
| - **NumBlocks** - The total number of blocks in the file.  ``NumBlocks * BlockSize``
 | |
|   should equal the size of the file on disk.
 | |
| - **NumDirectoryBytes** - The size of the stream directory, in bytes.  The stream
 | |
|   directory contains information about each stream's size and the set of blocks
 | |
|   that it occupies.  It will be described in more detail later.
 | |
| - **BlockMapAddr** - The index of a block within the MSF file.  At this block is
 | |
|   an array of ``ulittle32_t``'s listing the blocks that the stream directory
 | |
|   resides on.  For large MSF files, the stream directory (which describes the
 | |
|   block layout of each stream) may not fit entirely on a single block.  As a
 | |
|   result, this extra layer of indirection is introduced, whereby this block
 | |
|   contains the list of blocks that the stream directory occupies, and the stream
 | |
|   directory itself can be stitched together accordingly.  The number of
 | |
|   ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
 | |
|   
 | |
| The Stream Directory
 | |
| ====================
 | |
| The Stream Directory is the root of all access to the other streams in an MSF
 | |
| file.  Beginning at byte 0 of the stream directory is the following structure:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct StreamDirectory {
 | |
|     ulittle32_t NumStreams;
 | |
|     ulittle32_t StreamSizes[NumStreams];
 | |
|     ulittle32_t StreamBlocks[NumStreams][];
 | |
|   };
 | |
|   
 | |
| And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
 | |
| Note that each of the last two arrays is of variable length, and in particular
 | |
| that the second array is jagged.  
 | |
| 
 | |
| **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
 | |
| streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
 | |
| 
 | |
| Stream 0: ceil(1000 / 4096) = 1 block
 | |
| 
 | |
| Stream 1: ceil(8000 / 4096) = 2 blocks
 | |
| 
 | |
| Stream 2: ceil(16000 / 4096) = 4 blocks
 | |
| 
 | |
| Stream 3: ceil(9000 / 4096) = 3 blocks
 | |
| 
 | |
| In total, 10 blocks are used.  Let's see what the stream directory might look
 | |
| like:
 | |
| 
 | |
| .. code-block:: c++
 | |
| 
 | |
|   struct StreamDirectory {
 | |
|     ulittle32_t NumStreams = 4;
 | |
|     ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
 | |
|     ulittle32_t StreamBlocks[][] = {
 | |
|       {4},
 | |
|       {5, 6},
 | |
|       {11, 9, 7, 8},
 | |
|       {10, 15, 12}
 | |
|     };
 | |
|   };
 | |
|   
 | |
| In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
 | |
| would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
 | |
| ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
 | |
| 
 | |
| Note also that the streams are discontiguous, and that part of stream 3 is in the
 | |
| middle of part of stream 2.  You cannot assume anything about the layout of the
 | |
| blocks!
 | |
| 
 | |
| Alignment and Block Boundaries
 | |
| ==============================
 | |
| As may be clear by now, it is possible for a single field (whether it be a high
 | |
| level record, a long string field, or even a single ``uint16``) to begin and
 | |
| end in separate blocks.  For example, if the block size is 4096 bytes, and a
 | |
| ``uint16`` field begins at the last byte of the current block, then it would
 | |
| need to end on the first byte of the next block.  Since blocks are not
 | |
| necessarily contiguously laid out in the file, this means that both the consumer
 | |
| and the producer of an MSF file must be prepared to split data apart
 | |
| accordingly.  In the aforementioned example, the high byte of the ``uint16``
 | |
| would be written to the last byte of block N, and the low byte would be written
 | |
| to the first byte of block N+1, which could be tens of thousands of bytes later
 | |
| (or even earlier!) in the file, depending on what the stream directory says.
 |