forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			402 lines
		
	
	
		
			19 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			402 lines
		
	
	
		
			19 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
======================================
 | 
						|
XRay Flight Data Recorder Trace Format
 | 
						|
======================================
 | 
						|
 | 
						|
:Version: 1 as of 2017-07-20
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
When gathering XRay traces in Flight Data Recorder mode, each thread of an
 | 
						|
application will claim buffers to fill with trace data, which at some point
 | 
						|
is finalized and flushed.
 | 
						|
 | 
						|
A goal of the profiler is to minimize overhead, the flushed data directly
 | 
						|
corresponds to the buffer.
 | 
						|
 | 
						|
This document describes the format of a trace file.
 | 
						|
 | 
						|
 | 
						|
General
 | 
						|
=======
 | 
						|
 | 
						|
Each trace file corresponds to a sequence of events in a particular thread.
 | 
						|
 | 
						|
The file has a header followed by a sequence of discriminated record types.
 | 
						|
 | 
						|
The endianness of byte fields matches the endianess of the platform which
 | 
						|
produced the trace file.
 | 
						|
 | 
						|
 | 
						|
Header Section
 | 
						|
==============
 | 
						|
 | 
						|
A trace file begins with a 32 byte header.
 | 
						|
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| Field             | Size (bytes)    | Description                            |
 | 
						|
+===================+=================+========================================+
 | 
						|
| version           | ``2``           | Anticipates versioned  readers. This   |
 | 
						|
|                   |                 | document describes the format when     |
 | 
						|
|                   |                 | version == 1                           |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| type              | ``2``           | An enumeration encoding the type of    |
 | 
						|
|                   |                 | trace. Flight Data Recorder mode       |
 | 
						|
|                   |                 | traces have type == 1                  |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| bitfield          | ``4``           | Holds parameters that are not aligned  |
 | 
						|
|                   |                 | to bytes. Further described below.     |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| cycle_frequency   | ``8``           | The frequency in hertz of the CPU      |
 | 
						|
|                   |                 | oscillator used to measure duration of |
 | 
						|
|                   |                 | events in ticks.                       |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| buffer_size       | ``8``           | The size in bytes of the data portion  |
 | 
						|
|                   |                 | of the trace following the header.     |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
| reserved          | ``8``           | Reserved for future use.               |
 | 
						|
+-------------------+-----------------+----------------------------------------+
 | 
						|
 | 
						|
The bitfield parameter of the file header is composed of the following fields.
 | 
						|
 | 
						|
+-------------------+----------------+-----------------------------------------+
 | 
						|
| Field             | Size (bits)    | Description                             |
 | 
						|
+===================+================+=========================================+
 | 
						|
| constant_tsc      | ``1``          | Whether the platform's timestamp        |
 | 
						|
|                   |                | counter used to record ticks between    |
 | 
						|
|                   |                | events ticks at a constant frequency    |
 | 
						|
|                   |                | despite CPU frequency changes.          |
 | 
						|
|                   |                | 0 == non-constant. 1 == constant.       |
 | 
						|
+-------------------+----------------+-----------------------------------------+
 | 
						|
| nonstop_tsc       | ``1``          | Whether the tsc continues to count      |
 | 
						|
|                   |                | despite whether the CPU is in a low     |
 | 
						|
|                   |                | power state. 0 == stop. 1 == non-stop.  |
 | 
						|
+-------------------+----------------+-----------------------------------------+
 | 
						|
| reserved          | ``30``         | Not meaningful.                         |
 | 
						|
+-------------------+----------------+-----------------------------------------+
 | 
						|
 | 
						|
 | 
						|
Data Section
 | 
						|
============
 | 
						|
 | 
						|
Following the header in a trace is a data section with size matching the
 | 
						|
buffer_size field in the header.
 | 
						|
 | 
						|
The data section is a stream of elements of different types.
 | 
						|
 | 
						|
There are a few categories of data in the sequence.
 | 
						|
 | 
						|
- ``Function Records``: Function Records contain the timing of entry into and
 | 
						|
  exit from function execution. Function Records have 8 bytes each.
 | 
						|
 | 
						|
- ``Metadata Records``: Metadata records serve many purposes. Mostly, they
 | 
						|
  capture information that may be too costly to record for each function, but
 | 
						|
  that is required to contextualize the fine-grained timings. They also are used
 | 
						|
  as markers for user-defined Event Data payloads. Metadata records have 16
 | 
						|
  bytes each.
 | 
						|
 | 
						|
- ``Event Data``: Free form data may be associated with events that are traced
 | 
						|
  by the binary and encode data defined by a handler function. Event data is
 | 
						|
  always preceded with a marker record which indicates how large it is.
 | 
						|
 | 
						|
- ``Function Arguments``: The arguments to some functions are included in the
 | 
						|
  trace. These are either pointer addresses or primitives that are read and
 | 
						|
  logged independently of their types in a high level language. To the tracer,
 | 
						|
  they are all numbers. Function Records that have attached arguments will
 | 
						|
  indicate their presence on the function entry record. We only support logging
 | 
						|
  contiguous function argument sequences starting with argument zero, which will
 | 
						|
  be the "this" pointer for member function invocations. For example, we don't
 | 
						|
  support logging the first and third argument.
 | 
						|
 | 
						|
A reader of the memory format must maintain a state machine. The format makes no
 | 
						|
attempt to pad for alignment, and it is not seekable.
 | 
						|
 | 
						|
 | 
						|
Function Records
 | 
						|
----------------
 | 
						|
 | 
						|
Function Records have an 8 byte layout. This layout encodes information to
 | 
						|
reconstruct a call stack of instrumented function and their durations.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bits)  | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| discriminant  | ``1``        | Indicates whether a reader should read a      |
 | 
						|
|               |              | Function or Metadata record. Set to ``0`` for |
 | 
						|
|               |              | Function records.                             |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| action        | ``3``        | Specifies whether the function is being       |
 | 
						|
|               |              | entered, exited, or is a non-standard entry   |
 | 
						|
|               |              | or exit produced by optimizations.            |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| function_id   | ``28``       | A numeric ID for the function. Resolved to a  |
 | 
						|
|               |              | name via the xray instrumentation map. The    |
 | 
						|
|               |              | instrumentation map is built by xray at       |
 | 
						|
|               |              | compile time into an object file and pairs    |
 | 
						|
|               |              | the function ids to addresses. It is used for |
 | 
						|
|               |              | patching and as a lookup into the binary's    |
 | 
						|
|               |              | symbols to obtain names.                      |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| tsc_delta     | ``32``       | The number of ticks of the timestamp counter  |
 | 
						|
|               |              | since a previous record recorded a delta or   |
 | 
						|
|               |              | other TSC resetting event.                    |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
On little-endian machines, the bitfields are ordered from least significant bit
 | 
						|
bit to most significant bit. A reader can read an 8 bit value and apply the mask
 | 
						|
``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
 | 
						|
shift right by ``0x04`` to obtain the function_id field.
 | 
						|
 | 
						|
On big-endian machine, the bitfields are written in order from most significant
 | 
						|
bit to least significant bit. A reader would read an 8 bit value and unsigned
 | 
						|
shift right by 7 bits for the discriminant. The function_id field could be
 | 
						|
obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
 | 
						|
 | 
						|
Function action types are as follows.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Type          | Number       | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| Entry         | ``0``        | Typical function entry.                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Exit          | ``1``        | Typical function exit.                        |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Tail_Exit     | ``2``        | An exit from a function due to tail call      |
 | 
						|
|               |              | optimization.                                 |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Entry_Args    | ``3``        | A function entry that records arguments.      |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
Entry_Args records do not contain the arguments themselves. Instead, metadata
 | 
						|
records for each of the logged args follow the function record in the stream.
 | 
						|
 | 
						|
 | 
						|
Metadata Records
 | 
						|
----------------
 | 
						|
 | 
						|
Interspersed throughout the buffer are 16 byte Metadata records. For typically
 | 
						|
instrumented binaries, they will be sparser than Function records, and they
 | 
						|
provide a fuller picture of the binary execution state.
 | 
						|
 | 
						|
Metadata record layout is partially record dependent, but they share a common
 | 
						|
structure.
 | 
						|
 | 
						|
The same bit field rules described for function records apply to the first byte
 | 
						|
of MetadataRecords. Within this byte, little endian machines use lsb to msb
 | 
						|
ordering and big endian machines use msb to lsb ordering.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size         | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| discriminant  | ``1 bit``    | Indicates whether a reader should read a      |
 | 
						|
|               |              | Function or Metadata record. Set to ``1`` for |
 | 
						|
|               |              | Metadata records.                             |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| record_kind   | ``7 bits``   | The type of Metadata record.                  |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| data          | ``15 bytes`` | A data field used differently for each record |
 | 
						|
|               |              | type.                                         |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
Here is a table of the enumerated record kinds.
 | 
						|
 | 
						|
+--------+---------------------------+
 | 
						|
| Number | Type                      |
 | 
						|
+========+===========================+
 | 
						|
| 0      | NewBuffer                 |
 | 
						|
+--------+---------------------------+
 | 
						|
| 1      | EndOfBuffer               |
 | 
						|
+--------+---------------------------+
 | 
						|
| 2      | NewCPUId                  |
 | 
						|
+--------+---------------------------+
 | 
						|
| 3      | TSCWrap                   |
 | 
						|
+--------+---------------------------+
 | 
						|
| 4      | WallTimeMarker            |
 | 
						|
+--------+---------------------------+
 | 
						|
| 5      | CustomEventMarker         |
 | 
						|
+--------+---------------------------+
 | 
						|
| 6      | CallArgument              |
 | 
						|
+--------+---------------------------+
 | 
						|
 | 
						|
 | 
						|
NewBuffer Records
 | 
						|
-----------------
 | 
						|
 | 
						|
Each buffer begins with a NewBuffer record immediately after the header.
 | 
						|
It records the thread ID of the thread that the trace belongs to.
 | 
						|
 | 
						|
Its data segment is as follows.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| thread_Id     | ``2``        | Thread ID for buffer.                         |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``13``       | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
WallClockTime Records
 | 
						|
---------------------
 | 
						|
 | 
						|
Following the NewBuffer record, each buffer records an absolute time as a frame
 | 
						|
of reference for the durations recorded by timestamp counter deltas.
 | 
						|
 | 
						|
Its data segment is as follows.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| seconds       | ``8``        | Seconds on absolute timescale. The starting   |
 | 
						|
|               |              | point is unspecified and depends on the       |
 | 
						|
|               |              | implementation and platform configured by the |
 | 
						|
|               |              | tracer.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| microseconds  | ``4``        | The microsecond component of the time.        |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``3``        | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
NewCpuId Records
 | 
						|
----------------
 | 
						|
 | 
						|
Each function entry invokes a routine to determine what CPU is executing.
 | 
						|
Typically, this is done with readtscp, which reads the timestamp counter at the
 | 
						|
same time.
 | 
						|
 | 
						|
If the tracing detects that the execution has switched CPUs or if this is the
 | 
						|
first instrumented entry point, the tracer will output a NewCpuId record.
 | 
						|
 | 
						|
Its data segment is as follows.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| cpu_id        | ``2``        | CPU Id.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| absolute_tsc  | ``8``        | The absolute value of the timestamp counter.  |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``5``        | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
TSCWrap Records
 | 
						|
---------------
 | 
						|
 | 
						|
Since each function record uses a 32 bit value to represent the number of ticks
 | 
						|
of the timestamp counter since the last reference, it is possible for this value
 | 
						|
to overflow, particularly for sparsely instrumented binaries.
 | 
						|
 | 
						|
When this delta would not fit into a 32 bit representation, a reference absolute
 | 
						|
timestamp counter record is written in the form of a TSCWrap record.
 | 
						|
 | 
						|
Its data segment is as follows.
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| absolute_tsc  | ``8``        | Timestamp counter value.                      |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``7``        | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
CallArgument Records
 | 
						|
--------------------
 | 
						|
 | 
						|
Immediately following an Entry_Args type function record, there may be one or
 | 
						|
more CallArgument records that contain the traced function's parameter values.
 | 
						|
 | 
						|
The order of the CallArgument Record sequency corresponds one to one with the
 | 
						|
order of the function parameters.
 | 
						|
 | 
						|
CallArgument data segment:
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| argument      | ``8``        | Numeric argument (may be pointer address).    |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``7``        | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
CustomEventMarker Records
 | 
						|
-------------------------
 | 
						|
 | 
						|
XRay provides the feature of logging custom events. This may be leveraged to
 | 
						|
record tracing info for RPCs or similarly trace data that is application
 | 
						|
specific.
 | 
						|
 | 
						|
Custom Events themselves are an unstructured (application defined) segment of
 | 
						|
memory with arbitrary size within the buffer. They are preceded by
 | 
						|
CustomEventMarkers to indicate their presence and size.
 | 
						|
 | 
						|
CustomEventMarker data segment:
 | 
						|
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| Field         | Size (bytes) | Description                                   |
 | 
						|
+===============+==============+===============================================+
 | 
						|
| event_size    | ``4``        | Size of preceded event.                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| absolute_tsc  | ``8``        | A timestamp counter of the event.             |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
| reserved      | ``3``        | Unused.                                       |
 | 
						|
+---------------+--------------+-----------------------------------------------+
 | 
						|
 | 
						|
 | 
						|
EndOfBuffer Records
 | 
						|
-------------------
 | 
						|
 | 
						|
An EndOfBuffer record type indicates that there is no more trace data in this
 | 
						|
buffer. The reader is expected to seek past the remaining buffer_size expressed
 | 
						|
before the start of buffer and look for either another header or EOF.
 | 
						|
 | 
						|
 | 
						|
Format Grammar and Invariants
 | 
						|
=============================
 | 
						|
 | 
						|
Not all sequences of Metadata records and Function records are valid data. A
 | 
						|
sequence should be parsed as a state machine. The expectations for a valid
 | 
						|
format can be expressed as a context free grammar.
 | 
						|
 | 
						|
This is an attempt to explain the format with statements in EBNF format.
 | 
						|
 | 
						|
- Format := Header ThreadBuffer* EOF
 | 
						|
 | 
						|
- ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
 | 
						|
 | 
						|
- BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
 | 
						|
 | 
						|
- Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
 | 
						|
 | 
						|
- CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
 | 
						|
 | 
						|
- End := EndOfBuffer RemainingBufferSizeToSkip
 | 
						|
 | 
						|
 | 
						|
Function Record Order
 | 
						|
---------------------
 | 
						|
 | 
						|
There are a few clarifications that may help understand what is expected of
 | 
						|
Function records.
 | 
						|
 | 
						|
- Functions with an Exit are expected to have a corresponding Entry or
 | 
						|
  Entry_Args function record precede them in the trace.
 | 
						|
 | 
						|
- Tail_Exit Function records record the Function ID of the function whose return
 | 
						|
  address the program counter will take. In other words, the final function that
 | 
						|
  would be popped off of the call stack if tail call optimization was not used.
 | 
						|
 | 
						|
- Not all functions marked for instrumentation are necessarily in the trace. The
 | 
						|
  tracer uses heuristics to preserve the trace for non-trivial functions.
 | 
						|
 | 
						|
- Not every entry must have a traced Exit or Tail Exit. The buffer may run out
 | 
						|
  of space or the program may request for the tracer to finalize toreturn the
 | 
						|
  buffer before an instrumented function exits.
 |