Pavel Labath taught me that clang-format sorts headers automatically
using llvm's rules, and it's better not to have spaces between
So in this diff I'm removing those spaces and formatting them as well.
I used `clang-format -i` to format these files.
Add a new "kernel" section with following schema.
```
"kernel": {
"loadAddress"?: decimal | hex string | string decimal
# This is optional. If it's not specified, use default address 0xffffffff81000000.
"file": string
# path to the kernel image
}
```
Here's more details of the diff:
- If "kernel" section exist, it means current tracing mode is //KernelMode//.
- If tracing mode is //KernelMode//, the "processes" section must be empty and the "kernel" and "cpus" section must be provided. This is tested with `TestTraceLoad`.
- "kernel" section is parsed and turned into a new process with a single module which is the kernel image. The kernel process has N fake threads, one for each cpu.
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D130805
As previously discussed with @jj10306, we didn't really have a name for
the post-mortem (or offline) trace session representation, which is in
fact a folder with a bunch of files. We decided to call this folder
"trace bundle", and the main JSON file in it "trace bundle description
file". This naming is pretty decent, so I'm refactoring all the existing
code to account for that.
Differential Revision: https://reviews.llvm.org/D128484
llvm's JSON parser supports 64 bit integers, but other tools like the
ones written in JS don't support numbers that big, so we need to
represent these possibly big numbers as a string. This diff uses that to
represent addresses and tsc zero. The former is printed in hex for and
the latter in decimal string form. The schema was updated mentioning
that.
Besides that, I fixed some remaining issues and now all test pass. Before I wasn't running all tests because for some reason my computer reverted perf_paranoid to 1.
Differential Revision: https://reviews.llvm.org/D127819
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema
- traceBuffer -> iptTrace
- core -> cpu
Differential Revision: https://reviews.llvm.org/D127817
This applies the changes requested for diff 12.
- use DenseMap<ConstString, _> instead of std::unordered_map<ConstString, _>, which is more idiomatic and possibly performant.
- deduplicate some code in Trace.cpp by using helper functions for fetching in maps
- stop using size and offset when fetching binary data, because we in fact read the entire buffers all the time. If we ever need streaming, we can implement it then. Now, the size is used only to check that we are getting the correct amount of data. This is useful because in some cases determining the size doesn't involve fetching the actual data.
- added back the x86_64 macro to the perf tests
- added more documentation
- simplified some file handling
- fixed some comments
Differential Revision: https://reviews.llvm.org/D127752
This is the final functional patch to support intel pt decoding per cpu.
It works by doing the following:
- First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core.
- Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons.
- With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded.
- Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.
I'm adding a test as well.
Differential Revision: https://reviews.llvm.org/D126394
- Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed.
- The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information.
- Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding.
- Add the logic to conver nanoseconds to TSCs.
- Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server.
- Finish the necessary bits for loading and saving a multi-core trace bundle from disk.
- Change some size_t to uint64_t for compatibility with 32 bit systems.
Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text:
```
(lldb) trace load /tmp/trace3/trace.json (lldb) thread trace dump info Trace technology: intel-pt
thread #1: tid = 4192415
Total number of instructions: 1
Memory usage:
Total approximate memory usage (excluding raw trace): 2.51 KiB
Average memory usage per instruction (excluding raw trace): 2573.00 bytes
Timing for this thread:
Timing for global tasks:
Context switch trace decoding: 0.00s
Events:
Number of instructions with events: 0
Number of individual events: 0
Multi-core decoding:
Total number of continuous executions found: 2499
Number of continuous executions for this thread: 102
Errors:
Number of TSC decoding errors: 0
```
Differential Revision: https://reviews.llvm.org/D126267
:q!
This diff is massive, but it's because it connects the client with lldb-server
and also ensures that the postmortem case works.
- Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type".
-- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse.
-- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal.
-- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo.
- Implemented reading the context switch trace data buffer. I had to do
some refactors to do that cleanly.
-- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind.
- Implemented all the necessary bits to read trace.json files with per-core data.
- Implemented all the necessary bits to save to disk per-core trace session.
- Added a test that ensures that parsing and saving to disk works.
Differential Revision: https://reviews.llvm.org/D126015
added new command "process trace save -d <directory>".
-it saves a JSON file as <directory>/trace.json, with the main properties of the trace session.
-it saves binary Intel-pt trace as <directory>/thread_id.trace; each file saves each thread.
-it saves modules to the directory <directory>/modules .
-it only works for live process and it only support Intel-pt right now.
Example:
```
b main
run
process trace start
n
process trace save -d /tmp/mytrace
```
A file named trace.json and xxx.trace should be generated in /tmp/mytrace. To load the trace that was just saved:
```
trace load /tmp/mytrace
thread trace dump instructions
```
You should see the instructions of the trace got printed.
To run a test:
```
cd ~/llvm-sand/build/Release/fbcode-x86_64/toolchain
ninja lldb-dotest
./bin/lldb-dotest -p TestTraceSave
```
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D107669