Commit Graph

13 Commits

Author SHA1 Message Date
Walter Erquinigo e17cae076c [trace][intel pt] Fix per-psb packet decoding
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.

A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening

1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.

The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.

As part of making sure this diff is correct, I added some other features that are very useful.

- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.

I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.

A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!

```
thread #1: tid = 3578223
    0: (event) trace synchronization point [offset = 0x0xef0]
  a.out`main + 20 at main.cpp:29:20
    1: 0x0000000000402479    leaq   -0x1210(%rbp), %rax
    2: (event) software disabled tracing
    3: 0x0000000000402480    movq   %rax, %rdi
    4: (event) software disabled tracing
    5: (event) software disabled tracing
    6: 0x0000000000402483    callq  0x403bd4                  ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
    7: (event) software disabled tracing
  a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
    8: 0x0000000000403bd4    pushq  %rbp
    9: (event) software disabled tracing
    10: 0x0000000000403bd5    movq   %rsp, %rbp
    11: (event) software disabled tracing
```

This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f                                                                                                         thread #1: tid = 3603082
    0: (event) trace synchronization point [offset = 0x0x80]
    47417: (event) software disabled tracing
    129231: (event) trace synchronization point [offset = 0x0x800]
    146747: (event) software disabled tracing
    246076: (event) software disabled tracing
    259068: (event) trace synchronization point [offset = 0x0xf78]
    259276: (event) software disabled tracing
    259278: (event) software disabled tracing
    no more data
```

Differential Revision: https://reviews.llvm.org/D131630
2022-08-12 15:13:48 -07:00
Jakob Johnson f9b4ea0ce9 [trace] Add SBTraceCursor bindings
Add bindings for the `TraceCursor` to allow for programatic traversal of
traces.
This diff adds bindings for all public `TraceCursor` methods except
`GetHwClock` and also adds `SBTrace::CreateNewCursor`. A new unittest
has been added to TestTraceLoad.py that uses the new `SBTraceCursor` API
to test that the sequential and random access APIs of the `TraceCursor`
are equivalent.

This diff depends on D130925.

Test Plan:
`ninja lldb-dotest && ./bin/lldb-dotest -p TestTraceLoad`

Differential Revision: https://reviews.llvm.org/D130930
2022-08-02 16:55:33 -07:00
Jakob Johnson 3bec33b16d [trace] Replace TraceCursorUP with TraceCursorSP
The use of `std::unique_ptr` with `TraceCursor` adds unnecessary complexity to adding `SBTraceCursor` bindings
Specifically, since `TraceCursor` is an abstract class there's no clean
way to provide "deep clone" semantics for `TraceCursorUP` short of
creating a pure virtual `clone()` method (afaict).

After discussing with @wallace, we decided there is no strong reason to
favor wrapping `TraceCursor` with `std::unique_ptr` over `std::shared_ptr`, thus this diff
replaces all usages of `std::unique_ptr<TraceCursor>` with `std::shared_ptr<TraceCursor>`.

This sets the stage for future diffs to introduce `SBTraceCursor`
bindings in a more clean fashion.

Test Plan:

Differential Revision: https://reviews.llvm.org/D130925
2022-08-01 13:53:53 -07:00
Jakob Johnson dde3cf2e83 [trace] Add instruction control flow kind to JSON trace dumper's output
D128477 adds a '-k' flag which displays each instruction's control flow in the `thread trace dump instructions` command's non-json  output (ie no '-j' or '-J' flag)
This diff adds the instruction control flow kind to the `thread trace dump instructions` command's JSON output (ie '-j' or '-J' flag)

Test Plan:
Confirm "controlFlowKind" is present in JSON when '-k' is provided
```
(lldb) thread trace dump instructions -J -k
[
  {                                                                                                                                                                                                                                                                                 [141/1952]
    "id": 7755,
    "loadAddress": "0x400868",
    "module": "test.out",
    "symbol": "main",
    "mnemonic": "jmp",
    "controlFlowKind": "jump",
    "source": "/home/jakobjohnson/jakob-dev/test.cpp",
    "line": 41,
    "column": 29
  },
  {
    "id": 7753,
    "loadAddress": "0x7ffff7b54dab",
    "module": "libstdc++.so.6",
    "symbol": "std::ostream::flush()",
    "mnemonic": "retq",
    "controlFlowKind": "return"
  },
  {
    "id": 7752,
    "loadAddress": "0x7ffff7b54daa",
    "module": "libstdc++.so.6",
    "symbol": "std::ostream::flush()",
    "mnemonic": "popq",
    "controlFlowKind": "other"
  },
  ...
]
```
Confirm "controlFlowKind" is not present when '-k' isn't provided
```
(lldb) thread trace dump instructions -J
[
  {
    "id": 7755,
    "loadAddress": "0x400868",
    "module": "test.out",
    "symbol": "main",
    "mnemonic": "jmp",
    "source": "/home/jakobjohnson/jakob-dev/test.cpp",
    "line": 41,
    "column": 29
  },
  {
    "id": 7753,
    "loadAddress": "0x7ffff7b54dab",
    "module": "libstdc++.so.6",
    "symbol": "std::ostream::flush()",
    "mnemonic": "retq"
  },
  {
    "id": 7752,
    "loadAddress": "0x7ffff7b54daa",
    "module": "libstdc++.so.6",
    "symbol": "std::ostream::flush()",
    "mnemonic": "popq"
  },
```

Differential Revision: https://reviews.llvm.org/D130607
2022-07-27 05:23:59 -07:00
Walter Erquinigo 4f676c2599 [trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.

Sample output:

```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
   26   };
   27
   28   int main() {
-> 29     std::vector<int> vvv;
   30     for (int i = 0; i < 100; i++)
   31       vvv.push_back(i);
   32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
    frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
   57     map<int, int> m;
   58     m[3] = 4;
   59
-> 60     map<string, string> m2;
   61     m2["5"] = "6";
   62
   63     std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
    0: [379567.000 ns] (event) HW clock tick [48599428476224707]
    1: [379569.000 ns] (event) CPU core changed [new CPU=2]
    2: [390487.000 ns] (event) HW clock tick [48599428476246495]
    3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
    4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
  libc.so.6`malloc
    5: [1662746.995 ns] 0x00007ffff7176660    endbr64
    6: [1662748.991 ns] 0x00007ffff7176664    movq   0x32387d(%rip), %rax      ;  + 408
    7: [1662750.986 ns] 0x00007ffff717666b    pushq  %r12
    8: [1662752.981 ns] 0x00007ffff717666d    pushq  %rbp
    9: [1662754.977 ns] 0x00007ffff717666e    pushq  %rbx
    10: [1662756.972 ns] 0x00007ffff717666f    movq   (%rax), %rax
    11: [1662758.967 ns] 0x00007ffff7176672    testq  %rax, %rax
    12: [1662760.963 ns] 0x00007ffff7176675    jne    0x9c7e0                   ; <+384>
    13: [1662762.958 ns] 0x00007ffff717667b    leaq   0x17(%rdi), %rax
    14: [1662764.953 ns] 0x00007ffff717667f    cmpq   $0x1f, %rax
    15: [1662766.949 ns] 0x00007ffff7176683    ja     0x9c730                   ; <+208>
    16: [1662768.944 ns] 0x00007ffff7176730    andq   $-0x10, %rax
    17: [1662770.939 ns] 0x00007ffff7176734    cmpq   $-0x41, %rax
    18: [1662772.935 ns] 0x00007ffff7176738    seta   %dl
    19: [1662774.930 ns] 0x00007ffff717673b    jmp    0x9c690                   ; <+48>
    20: [1662776.925 ns] 0x00007ffff7176690    cmpq   %rdi, %rax
    21: [1662778.921 ns] 0x00007ffff7176693    jb     0x9c7b0                   ; <+336>
    22: [1662780.916 ns] 0x00007ffff7176699    testb  %dl, %dl
    23: [1662782.911 ns] 0x00007ffff717669b    jne    0x9c7b0                   ; <+336>
    24: [1662784.906 ns] 0x00007ffff71766a1    movq   0x3236c0(%rip), %r12      ;  + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
  {
    "id": 0,
    "timestamp_ns": "379567.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476224707
  },
  {
    "id": 1,
    "timestamp_ns": "379569.000000",
    "event": "CPU core changed",
    "cpuId": 2
  },
  {
    "id": 2,
    "timestamp_ns": "390487.000000",
    "event": "HW clock tick",
    "hwClock": 48599428476246495
  },
  {
    "id": 3,
    "timestamp_ns": "1602508.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478664855
  },
  {
    "id": 4,
    "timestamp_ns": "1662745.000000",
    "event": "HW clock tick",
    "hwClock": 48599428478785046
  },
  {
    "id": 5,
    "timestamp_ns": "1662746.995324",
    "loadAddress": "0x7ffff7176660",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "endbr64"
  },
  {
    "id": 6,
    "timestamp_ns": "1662748.990648",
    "loadAddress": "0x7ffff7176664",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "movq"
  },
  {
    "id": 7,
    "timestamp_ns": "1662750.985972",
    "loadAddress": "0x7ffff717666b",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  },
  {
    "id": 8,
    "timestamp_ns": "1662752.981296",
    "loadAddress": "0x7ffff717666d",
    "module": "libc.so.6",
    "symbol": "malloc",
    "mnemonic": "pushq"
  }
]
```

Differential Revision: https://reviews.llvm.org/D130054
2022-07-26 12:05:23 -07:00
Slava Gurevich 4871dfc64e [LLDB][NFC][Reliability] Fix uninitialized variables from Coverity scan. Part 2
Improve LLDB reliability by fixing the following "uninitialized variables" static code inspection warnings from
scan.coverity.com:

1476275, 1274012, 1455035, 1364789, 1454282
1467483, 1406152, 1406255, 1454837, 1454416
1467446, 1462022, 1461909, 1420566, 1327228
1367767, 1431254, 1467299, 1312678, 1431780
1454731, 1490403

Differential Revision: https://reviews.llvm.org/D130528
2022-07-25 20:52:45 -07:00
Slava Gurevich 9877159dd6 Revert "[LLDB][NFC][Reliability] Fix uninitialized variables from Coverity scan. Part 2"
This reverts commit b9aedd94e6.
2022-07-25 18:23:19 -07:00
Slava Gurevich b9aedd94e6 [LLDB][NFC][Reliability] Fix uninitialized variables from Coverity scan. Part 2
Improve LLDB reliability by fixing the following "uninitialized variables" static code inspection warnings from
scan.coverity.com:

1476275, 1274012, 1455035, 1364789, 1454282
1467483, 1406152, 1406255, 1454837, 1454416
1467446, 1462022, 1461909, 1420566, 1327228
1367767, 1431254, 1467299, 1312678, 1431780
1454731, 1490403

Differential Revision: https://reviews.llvm.org/D130528
2022-07-25 16:40:57 -07:00
ymeng 0466d1df23 [trace][intel pt] Support dumping the trace info in json
Thanks to ymeng@fb.com for coming up with this change.

`thread trace dump info` can dump some metrics that can be useful for
analyzing the performance and quality of a trace. This diff adds a --json
option for dumping this information in json format that can be easily
understood my machines.

Differential Revision: https://reviews.llvm.org/D129332
2022-07-13 12:26:11 -07:00
Walter Erquinigo 4a843d9282 [trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to fredzhou@fb.com for coming up with this feature.

When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.

Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.

Differential Revision: https://reviews.llvm.org/D129340
2022-07-13 12:26:11 -07:00
Walter Erquinigo dbc0cb0198 [trace] Avoid a crash in the dumper when disassembling fails
In rare situations, disassemblying would fail that produce an invalid
InstructionSP object. We need to check that it's valid before using.

With this change, now the dumper doesn't crash with dumping instructions of
ioctl. In fact, it now dumps this output

 {
    "id": 6135,
    "loadAddress": "0x7f4bfe5c7515",
    "module": "libc.so.6",
    "symbol": "ioctl",
    "source": "glibc/2.34/src/glibc-2.34/sysdeps/unix/syscall-template.S",
    "line": 120,
    "column": 0
 }

Anyway, we need to investigate why the diassembler failed disassembling that
instruction. From over 2B instructions I was disassembling today, just this
one failed, so this could be a bug in LLVM's core disassembler.

Differential Revision: https://reviews.llvm.org/D129588
2022-07-12 16:23:03 -07:00
Walter Erquinigo ad7bcda940 [trace] Add a flag to the decoder to output the instruction type
To build complex binding upon instruction trace, additional metadata 'instruction type' is needed.

This diff has followings:
 - Add a flag -k  / --kind for instruction dump
 - Remove SetGranularity and SetIgnoreErros from Trace cursor

Sample output:

```
(lldb) thread trace dump instruction -k
thread #1: tid = 3198805
  libc.so.6`_IO_puts + 356
    2107: 0x00007ffff7163594 (    return)     retq
    2106: 0x00007ffff7163592 (     other)     popq   %r13
    2105: 0x00007ffff7163590 (     other)     popq   %r12
    2104: 0x00007ffff716358f (     other)     popq   %rbp
    2103: 0x00007ffff716358e (     other)     popq   %rbx
    2102: 0x00007ffff716358c (     other)     movl   %ebx, %eax
    2101: 0x00007ffff7163588 (     other)     addq   $0x8, %rsp
    2100: 0x00007ffff7163570 ( cond jump)     je     0x89588                   ; <+344>
    2099: 0x00007ffff716356e (     other)     decl   (%rdx)
    2098: 0x00007ffff7163565 ( cond jump)     je     0x8956e                   ; <+318>
    2097: 0x00007ffff716355e (     other)     cmpl   $0x0, 0x33c02b(%rip)      ; __libc_multiple_threads
    2096: 0x00007ffff7163556 (     other)     movq   $0x0, 0x8(%rdx)
    2095: 0x00007ffff7163554 ( cond jump)     jne    0x89588                   ; <+344>
    2094: 0x00007ffff7163550 (     other)     subl   $0x1, 0x4(%rdx)
    2093: 0x00007ffff7163549 (     other)     movq   0x88(%rbp), %rdx
    2092: 0x00007ffff7163547 ( cond jump)     jne    0x89588                   ; <+344>
    2091: 0x00007ffff7163540 (     other)     testl  $0x8000, (%rbp)           ; imm = 0x8000
    2090: 0x00007ffff716353c (     other)     cmovaq %rax, %rbx
    2089: 0x00007ffff7163535 (     other)     cmpq   $0x7fffffff, %rbx         ; imm = 0x7FFFFFFF
    2088: 0x00007ffff7163530 (     other)     movl   $0x7fffffff, %eax         ; imm = 0x7FFFFFFF
```

Reviewed By: wallace

Differential Revision: https://reviews.llvm.org/D128477
2022-07-12 16:23:03 -07:00
Walter Erquinigo a7d6c3effe [trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.

- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.

All test pass

Differential Revision: https://reviews.llvm.org/D128576
2022-06-29 09:19:51 -07:00