llvm-project

Commit Graph

Author	SHA1	Message	Date
Walter Erquinigo	e17cae076c	[trace][intel pt] Fix per-psb packet decoding The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need. A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening 1. PSB A was processed all the way up to the end of the trace, which includes PSB B. 2. PSB B was then processed until the end of the trace. The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs. As part of making sure this diff is correct, I added some other features that are very useful. - Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted. - Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths. - Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information. - Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding. - Added many comments, asserts and improved error handling for clarity. - Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before. - SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs. - Added an "only-events" option to the trace dumper to inspect only events. I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task. A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened! ``` thread #1: tid = 3578223 0: (event) trace synchronization point [offset = 0x0xef0] a.out`main + 20 at main.cpp:29:20 1: 0x0000000000402479 leaq -0x1210(%rbp), %rax 2: (event) software disabled tracing 3: 0x0000000000402480 movq %rax, %rdi 4: (event) software disabled tracing 5: (event) software disabled tracing 6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7 7: (event) software disabled tracing a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7 8: 0x0000000000403bd4 pushq %rbp 9: (event) software disabled tracing 10: 0x0000000000403bd5 movq %rsp, %rbp 11: (event) software disabled tracing ``` This is another trace of a long program with a few PSBs. ``` (lldb) thread trace dump instructions -E -f thread #1: tid = 3603082 0: (event) trace synchronization point [offset = 0x0x80] 47417: (event) software disabled tracing 129231: (event) trace synchronization point [offset = 0x0x800] 146747: (event) software disabled tracing 246076: (event) software disabled tracing 259068: (event) trace synchronization point [offset = 0x0xf78] 259276: (event) software disabled tracing 259278: (event) software disabled tracing no more data ``` Differential Revision: https://reviews.llvm.org/D131630	2022-08-12 15:13:48 -07:00
Walter Erquinigo	c4fb631cee	[NFC][lldb][trace] Fix formatting of tracing files Pavel Labath taught me that clang-format sorts headers automatically using llvm's rules, and it's better not to have spaces between So in this diff I'm removing those spaces and formatting them as well. I used `clang-format -i` to format these files.	2022-08-11 11:00:26 -07:00
Walter Erquinigo	6fb744be76	[trace][intel pt] Support a new kernel section in LLDB’s trace bundle schema Add a new "kernel" section with following schema. ``` "kernel": { "loadAddress"?: decimal \| hex string \| string decimal # This is optional. If it's not specified, use default address 0xffffffff81000000. "file": string # path to the kernel image } ``` Here's more details of the diff: - If "kernel" section exist, it means current tracing mode is //KernelMode//. - If tracing mode is //KernelMode//, the "processes" section must be empty and the "kernel" and "cpus" section must be provided. This is tested with `TestTraceLoad`. - "kernel" section is parsed and turned into a new process with a single module which is the kernel image. The kernel process has N fake threads, one for each cpu. Reviewed By: wallace Differential Revision: https://reviews.llvm.org/D130805	2022-08-04 17:15:08 -07:00
Walter Erquinigo	d179ea12fd	[NFC][trace] format source files Cleanup formatting diff	2022-08-02 21:16:31 -07:00
Jakob Johnson	3bec33b16d	[trace] Replace TraceCursorUP with TraceCursorSP The use of `std::unique_ptr` with `TraceCursor` adds unnecessary complexity to adding `SBTraceCursor` bindings Specifically, since `TraceCursor` is an abstract class there's no clean way to provide "deep clone" semantics for `TraceCursorUP` short of creating a pure virtual `clone()` method (afaict). After discussing with @wallace, we decided there is no strong reason to favor wrapping `TraceCursor` with `std::unique_ptr` over `std::shared_ptr`, thus this diff replaces all usages of `std::unique_ptr<TraceCursor>` with `std::shared_ptr<TraceCursor>`. This sets the stage for future diffs to introduce `SBTraceCursor` bindings in a more clean fashion. Test Plan: Differential Revision: https://reviews.llvm.org/D130925	2022-08-01 13:53:53 -07:00
Walter Erquinigo	4f676c2599	[trace][intel pt] Introduce wall clock time for each trace item - Decouple TSCs from trace items - Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors. - Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item. - For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started. Sample output: ``` (lldb) r Process 750047 launched: '/home/wallace/a.out' (x86_64) Process 750047 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20 26 }; 27 28 int main() { -> 29 std::vector<int> vvv; 30 for (int i = 0; i < 100; i++) 31 vvv.push_back(i); 32 (lldb) process trace start -s 64kb -t --per-cpu (lldb) b 60 Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe (lldb) c Process 750047 resuming Process 750047 stopped * thread #1, name = 'a.out', stop reason = breakpoint 2.1 frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23 57 map<int, int> m; 58 m[3] = 4; 59 -> 60 map<string, string> m2; 61 m2["5"] = "6"; 62 63 std::vector<std::string> vs = {"2", "3"}; (lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047 0: [379567.000 ns] (event) HW clock tick [48599428476224707] 1: [379569.000 ns] (event) CPU core changed [new CPU=2] 2: [390487.000 ns] (event) HW clock tick [48599428476246495] 3: [1602508.000 ns] (event) HW clock tick [48599428478664855] 4: [1662745.000 ns] (event) HW clock tick [48599428478785046] libc.so.6`malloc 5: [1662746.995 ns] 0x00007ffff7176660 endbr64 6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408 7: [1662750.986 ns] 0x00007ffff717666b pushq %r12 8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp 9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx 10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax 11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax 12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384> 13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax 14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax 15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208> 16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax 17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax 18: [1662772.935 ns] 0x00007ffff7176738 seta %dl 19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48> 20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax 21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336> 22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl 23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336> 24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24 (lldb) thread trace dump instructions -t -f -e -J -c 4 [ { "id": 0, "timestamp_ns": "379567.000000", "event": "HW clock tick", "hwClock": 48599428476224707 }, { "id": 1, "timestamp_ns": "379569.000000", "event": "CPU core changed", "cpuId": 2 }, { "id": 2, "timestamp_ns": "390487.000000", "event": "HW clock tick", "hwClock": 48599428476246495 }, { "id": 3, "timestamp_ns": "1602508.000000", "event": "HW clock tick", "hwClock": 48599428478664855 }, { "id": 4, "timestamp_ns": "1662745.000000", "event": "HW clock tick", "hwClock": 48599428478785046 }, { "id": 5, "timestamp_ns": "1662746.995324", "loadAddress": "0x7ffff7176660", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "endbr64" }, { "id": 6, "timestamp_ns": "1662748.990648", "loadAddress": "0x7ffff7176664", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "movq" }, { "id": 7, "timestamp_ns": "1662750.985972", "loadAddress": "0x7ffff717666b", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "pushq" }, { "id": 8, "timestamp_ns": "1662752.981296", "loadAddress": "0x7ffff717666d", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "pushq" } ] ``` Differential Revision: https://reviews.llvm.org/D130054	2022-07-26 12:05:23 -07:00
ymeng	0466d1df23	[trace][intel pt] Support dumping the trace info in json Thanks to ymeng@fb.com for coming up with this change. `thread trace dump info` can dump some metrics that can be useful for analyzing the performance and quality of a trace. This diff adds a --json option for dumping this information in json format that can be easily understood my machines. Differential Revision: https://reviews.llvm.org/D129332	2022-07-13 12:26:11 -07:00
Gaurav Gaur	d30fd5c3a1	[trace][intel pt] Add a cgroup filter It turns out that cgroup filtering is relatively trivial and works really nicely. Thid diffs adds automatic cgroup filtering when in per-cpu mode, unless a new --disable-cgroup-filtering flag is passed in the start command. At least on Meta machines, all processes are spawned inside a cgroup by default, which comes super handy, because per cpu tracing is now much more precise. A manual test gave me this result - Without filtering: Total number of trace items: 36083 Total number of continuous executions found: 229 Number of continuous executions for this thread: 2 Total number of PSB blocks found: 98 Number of PSB blocks for this thread 2 Total number of unattributed PSB blocks found: 38 - With filtering: Total number of trace items: 87756 Total number of continuous executions found: 123 Number of continuous executions for this thread: 2 Total number of PSB blocks found: 10 Number of PSB blocks for this thread 3 Total number of unattributed PSB blocks found: 2 Filtering gives us great results. The number of instructions collected more than double (probalby because we have less noise in the trace), and we have much less unattributed PSBs blocks and unrelated PSBs in general. The ones that are unrelated probably belong to other processes in the same cgroup. Differential Revision: https://reviews.llvm.org/D129257	2022-07-13 12:26:11 -07:00
Walter Erquinigo	b532dd545f	[trace] Add an option to save a compact trace bundle A trace bundle contains many trace files, and, in the case of intel pt, the largest files are often the context switch traces because they are not compressed by default. As a way to improve this, I'm adding a --compact option to the `trace save` command that filters out unwanted processes from the context switch traces. Eventually we can do the same for intel pt traces as well. Differential Revision: https://reviews.llvm.org/D129239	2022-07-13 11:43:28 -07:00
Peicong Wu	9f9464e02a	[trace][intel pt] Measure the time it takes to decode a thread in per-cpu mode This metric was missing. We were only measuring in per-thread mode, and this completes the work. For a sample trace I have, the `dump info` command shows ``` Timing for this thread: Decoding instructions: 0.12s ``` I also improved a bit the TaskTime function so that callers don't need to specify the template argument Differential Revision: https://reviews.llvm.org/D129249	2022-07-13 11:08:14 -07:00
rnofenko	db73a52d7b	[trace][intel pt] Add a nice parser for the trace size Thanks to rnofenko@fb.com for coming up with these changes. This diff adds support for passing units in the trace size inputs. For example, it's now possible to specify 64KB as the trace size, instead of the problematic 65536. This makes the user experience a bit friendlier. Differential Revision: https://reviews.llvm.org/D129613	2022-07-13 10:53:14 -07:00
Walter Erquinigo	a7d6c3effe	[trace] Make events first class items in the trace cursor and rework errors We want to include events with metadata, like context switches, and this requires the API to handle events with payloads (e.g. information about such context switches). Besides this, we want to support multiple similar events between two consecutive instructions, like multiple context switches. However, the current implementation is not good for this because we are defining events as bitmask enums associated with specific instructions. Thus, we need to decouple instructions from events and make events actual items in the trace, just like instructions and errors. - Add accessors in the TraceCursor to know if an item is an event or not - Modify from the TraceDumper all the way to DecodedThread to support - Renamed the paused event to disabled. - Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs. event items. Fortunately this simplified many things - As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions. All test pass Differential Revision: https://reviews.llvm.org/D128576	2022-06-29 09:19:51 -07:00
Walter Erquinigo	b8dcd0ba26	[NFC][lldb][trace] Rename trace session to trace bundle As previously discussed with @jj10306, we didn't really have a name for the post-mortem (or offline) trace session representation, which is in fact a folder with a bunch of files. We decided to call this folder "trace bundle", and the main JSON file in it "trace bundle description file". This naming is pretty decent, so I'm refactoring all the existing code to account for that. Differential Revision: https://reviews.llvm.org/D128484	2022-06-24 08:41:33 -07:00
Walter Erquinigo	ea37cd52d1	[trace][intelpt] Support system-wide tracing [22] - Some final touches Having a member variable TraceIntelPT * makes it look as if it was optional. I'm using instead a weak_ptr to indicate that it's not optional and the object is under the ownership of TraceIntelPT. Besides that, I've simplified the Perf aux and data buffers copying by using vector.insert. I'm also renaming Lookup2 to Lookup. The 2 in the name is confusing. Differential Revision: https://reviews.llvm.org/D127881	2022-06-16 11:42:22 -07:00
Walter Erquinigo	6a5355e8a1	[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema - traceBuffer -> iptTrace - core -> cpu Differential Revision: https://reviews.llvm.org/D127817	2022-06-16 11:42:22 -07:00
Walter Erquinigo	561a61fb26	[trace][intelpt] Support system-wide tracing [18] - some more improvements This applies the changes requested for diff 12. - use DenseMap<ConstString, _> instead of std::unordered_map<ConstString, _>, which is more idiomatic and possibly performant. - deduplicate some code in Trace.cpp by using helper functions for fetching in maps - stop using size and offset when fetching binary data, because we in fact read the entire buffers all the time. If we ever need streaming, we can implement it then. Now, the size is used only to check that we are getting the correct amount of data. This is useful because in some cases determining the size doesn't involve fetching the actual data. - added back the x86_64 macro to the perf tests - added more documentation - simplified some file handling - fixed some comments Differential Revision: https://reviews.llvm.org/D127752	2022-06-16 11:42:21 -07:00
Walter Erquinigo	03cc58ff2a	[trace][intelpt] Support system-wide tracing [17] - Some improvements This improves several things and addresses comments up to the diff [11] in this stack. - Simplify many functions to receive less parameters that they can identify easily - Create Storage classes for Trace and TraceIntelPT that can make it easier to reason about what can change with live process refreshes and what cannot. - Don't cache the perf zero conversion numbers in lldb-server to make sure we get the most up-to-date numbers. - Move the thread identifaction from context switches to the bundle parser, to leave TraceIntelPT simpler. This also makes sure that the constructor of TraceIntelPT is invoked when the entire data has been checked to be correct. - Normalize all bundle paths before the Processes, Threads and Modules are created, so that they can assume that all paths are correct and absolute - Fix some issues in the tests. Now they all pass. - return the specific instance when constructing PerThread and MultiCore processor tracers. - Properly implement IntelPTMultiCoreTrace::TraceStart. - Improve some comments. - Use the typedef ContextSwitchTrace more often for clarity. - Move CreateContextSwitchTracePerfEvent to Perf.h as a utility function. - Synchronize better the state of the context switch and the intel pt perf events. - Use a booblean instead of an enum for the PerfEvent state. Differential Revision: https://reviews.llvm.org/D127456	2022-06-16 11:23:02 -07:00
Walter Erquinigo	ff15efc1a7	[trace][intelpt] Support system-wide tracing [16] - Create threads automatically from context switch data in the post-mortem case For some context, The context switch data contains information of which threads were executed by each traced process, therefore it's not necessary to specify them in the trace file. So this diffs adds support for that automatic feature. Eventually we could include it to live processes as well. Differential Revision: https://reviews.llvm.org/D127001	2022-06-16 11:23:02 -07:00
Walter Erquinigo	1a3f996972	[trace][intelpt] Support system-wide tracing [13] - Add context switch decoding - Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed. - The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information. - Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding. - Add the logic to conver nanoseconds to TSCs. - Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server. - Finish the necessary bits for loading and saving a multi-core trace bundle from disk. - Change some size_t to uint64_t for compatibility with 32 bit systems. Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text: ``` (lldb) trace load /tmp/trace3/trace.json (lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 4192415 Total number of instructions: 1 Memory usage: Total approximate memory usage (excluding raw trace): 2.51 KiB Average memory usage per instruction (excluding raw trace): 2573.00 bytes Timing for this thread: Timing for global tasks: Context switch trace decoding: 0.00s Events: Number of instructions with events: 0 Number of individual events: 0 Multi-core decoding: Total number of continuous executions found: 2499 Number of continuous executions for this thread: 102 Errors: Number of TSC decoding errors: 0 ``` Differential Revision: https://reviews.llvm.org/D126267	2022-06-16 11:23:01 -07:00
Walter Erquinigo	fc5ef57c7d	[trace][intelpt] Support system-wide tracing [12] - Support multi-core trace load and save :q! This diff is massive, but it's because it connects the client with lldb-server and also ensures that the postmortem case works. - Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type". -- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse. -- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal. -- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo. - Implemented reading the context switch trace data buffer. I had to do some refactors to do that cleanly. -- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind. - Implemented all the necessary bits to read trace.json files with per-core data. - Implemented all the necessary bits to save to disk per-core trace session. - Added a test that ensures that parsing and saving to disk works. Differential Revision: https://reviews.llvm.org/D126015	2022-06-15 13:28:36 -07:00
Walter Erquinigo	a0a46473c3	[trace][intelpt] Support system-wide tracing [11] - Read warnings and perf conversion in the client - Add logging for when the live state of the process is refreshed - Move error handling of the live state refreshing to Trace from TraceIntelPT. This allows refreshing to fail either at the plug-in level or at the base class level. The error is cached and it can be gotten every time RefreshLiveProcessState is invoked. - Allow DoRefreshLiveProcessState to handle plugin-specific parameters. - Add some encapsulation to prevent TraceIntelPT from accessing variables belonging to Trace. Test done via logging: ``` (lldb) b main Breakpoint 1: where = a.out`main + 20 at main.cpp:27:20, address = 0x00000000004023d9 (lldb) r Process 2359706 launched: '/home/wallace/a.out' (x86_64) Process 2359706 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x00000000004023d9 a.out`main at main.cpp:27:20 24 }; 25 26 int main() { -> 27 std::vector<int> vvv; 28 for (int i = 0; i < 100000; i++) 29 vvv.push_back(i); 30 (lldb) process trace start (lldb) log enable lldb target -F(lldb) n Process 2359706 stopped * thread #1, name = 'a.out', stop reason = step over frame #0: 0x00000000004023e8 a.out`main at main.cpp:28:12 25 26 int main() { 27 std::vector<int> vvv; -> 28 for (int i = 0; i < 100000; i++) 29 vvv.push_back(i); 30 31 std::deque<int> dq1 = {1, 2, 3}; (lldb) thread trace dump instructions -c 2 -t Trace.cpp:RefreshLiveProcessState Trace::RefreshLiveProcessState invoked TraceIntelPT.cpp:DoRefreshLiveProcessState TraceIntelPT found tsc conversion information thread #1: tid = 2359706 a.out`std::vector<int, std::allocator<int>>::vector() + 26 at stl_vector.h:395:19 54: [tsc=unavailable] 0x0000000000403a7c retq ``` See the logging lines at the end of the dump. They indicate that refreshing happened and that perf conversion information was found. Differential Revision: https://reviews.llvm.org/D125943	2022-06-15 12:08:00 -07:00
Walter Erquinigo	1f49714d3e	[trace][intelpt] Support system-wide tracing [4] - Support per core tracing on lldb-server This diffs implements per-core tracing on lldb-server. It also includes tests that ensure that tracing can be initiated from the client and that the jLLDBGetState ppacket returns the list of trace buffers per core. This doesn't include any decoder changes. Finally, this makes some little changes here and there improving the existing code. A specific piece of code that can't reliably be tested is when tracing per core fails due to permissions. In this case we add a troubleshooting message and this is the manual test: ``` /proc/sys/kernel/perf_event_paranoid set to 1 (lldb) process trace start --per-core-tracing error: perf event syscall failed: Permission denied You might need that /proc/sys/kernel/perf_event_paranoid has a value of 0 or -1. `` Differential Revision: https://reviews.llvm.org/D124858	2022-05-17 12:46:54 -07:00
Walter Erquinigo	26d83a431e	[NFC][lldb][trace] Use uint64_t when decoding and enconding json llvm's json parser supports uint64_t, so let's better use it for the packets being sent between lldb and lldb-server instead of using int64_t as an intermediate type, which might be error-prone.	2022-05-17 11:08:04 -07:00
Walter Erquinigo	285b39a31e	Revert "[NFC][lldb][trace] Use uint64_t when decoding and enconding json" This reverts commit `9d2dd6d762`. Reverting because this exposes an issue in the uint64_t json parser.	2022-05-09 22:47:05 -07:00
Walter Erquinigo	9d2dd6d762	[NFC][lldb][trace] Use uint64_t when decoding and enconding json llvm's json parser supports uint64_t, so let's better use it for the packets being sent between lldb and lldb-server instead of using int64_t as an intermediate type, which might be error-prone.	2022-05-09 21:55:43 -07:00
Walter Erquinigo	7b73de9ec2	[trace][intelpt] Support system-wide tracing [3] - Refactor IntelPTThreadTrace I'm refactoring IntelPTThreadTrace into IntelPTSingleBufferTrace so that it can both single threads or single cores. In this diff I'm basically renaming the class, moving it to its own file, and removing all the pieces that are not used along with some basic cleanup. Differential Revision: https://reviews.llvm.org/D124648	2022-05-09 16:05:26 -07:00
Walter Erquinigo	b8d1776fc5	[trace][intelpt] Support system-wide tracing [2] - Add a dummy --per-core-tracing option This updates the documentation of the gdb-remote protocol, as well as the help messages, to include the new --per-core-tracing option. Differential Revision: https://reviews.llvm.org/D124640	2022-05-09 16:05:26 -07:00
Walter Erquinigo	5de0a3e9da	[trace][intelpt] Support system-wide tracing [1] - Add a method for accessing the list of logical core ids In order to open perf events per core, we need to first get the list of core ids available in the system. So I'm adding a function that does that by parsing /proc/cpuinfo. That seems to be the simplest and most portable way to do that. Besides that, I made a few refactors and renames to reflect better that the cpu info that we use in lldb-server comes from procfs. Differential Revision: https://reviews.llvm.org/D124573	2022-05-02 08:48:49 -07:00
Walter Erquinigo	059f39d2f4	[trace][intel pt] Support events A trace might contain events traced during the target's execution. For example, a thread might be paused for some period of time due to context switches or breakpoints, which actually force a context switch. Not only that, a trace might be paused because the CPU decides to trace only a specific part of the target, like the address filtering provided by intel pt, which will cause pause events. Besides this case, other kinds of events might exist. This patch adds the method `TraceCursor::GetEvents()`` that returns the list of events that happened right before the instruction being pointed at by the cursor. Some refactors were done to make this change simpler. Besides this new API, the instruction dumper now supports the -e flag which shows pause events, like in the following example, where pauses happened due to breakpoints. ``` thread #1: tid = 2717361 a.out`main + 20 at main.cpp:27:20 0: 0x00000000004023d9 leaq -0x1200(%rbp), %rax [paused] 1: 0x00000000004023e0 movq %rax, %rdi [paused] 2: 0x00000000004023e3 callq 0x403a62 ; std::vector<int, std::allocator<int> >::vector at stl_vector.h:391:7 a.out`std::vector<int, std::allocator<int> >::vector() at stl_vector.h:391:7 3: 0x0000000000403a62 pushq %rbp 4: 0x0000000000403a63 movq %rsp, %rbp ``` The `dump info` command has also been updated and now it shows the number of instructions that have associated events. Differential Revision: https://reviews.llvm.org/D123982	2022-04-25 19:01:23 -07:00
Walter Erquinigo	44103c96fa	[trace][intelpt] Remove code smell when printing the raw trace size Something ugly I did was to report the trace buffer size to the DecodedThread, which is later used as part of the `dump info` command. Instead of doing that, we can just directly ask the trace for the raw buffer and print its size. I thought about not asking for the entire trace but instead just for its size, but in this case, as our traces as not extremely big, I prefer to ask for the entire trace, ensuring it could be fetched, and then print its size. Differential Revision: https://reviews.llvm.org/D123358	2022-04-12 13:08:03 -07:00
Walter Erquinigo	bdf3e7e5b8	[trace][intelpt] Add task timer classes I'm adding two new classes that can be used to measure the duration of long tasks as process and thread level, e.g. decoding, fetching data from lldb-server, etc. In this first patch, I'm using it to measure the time it takes to decode each thread, which is printed out with the `dump info` command. In a later patch I'll start adding process-level tasks and I might move these classes to the upper Trace level, instead of having them in the intel-pt plugin. I might need to do that anyway in the future when we have to measure HTR. For now, I want to keep the impact of this change minimal. With it, I was able to generate the following info of a very big trace: ``` (lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 616081 Total number of instructions: 9729366 Memory usage: Raw trace size: 1024 KiB Total approximate memory usage (excluding raw trace): 123517.34 KiB Average memory usage per instruction (excluding raw trace): 13.00 bytes Timing: Decoding instructions: 1.62s Errors: Number of TSC decoding errors: 0 ``` As seen above, it took 1.62 seconds to decode 9.7M instructions. This is great news, as we don't need to do any optimization work in this area. Differential Revision: https://reviews.llvm.org/D123357	2022-04-12 13:08:03 -07:00
Walter Erquinigo	e0cfe20ad2	[trace][intel pt] Create a common accessor for live and postmortem data Some parts of the code have to distinguish between live and postmortem threads to figure out how to get some data, e.g. thread trace buffers. This makes the code less generic and more error prone. An example of that is that we have two different decoders: LiveThreadDecoder and PostMortemThreadDecoder. They exist because getting the trace bufer is different for each case. The problem doesn't stop there. Soon we'll have even more kinds of data, like the context switch trace, whose fetching will be different for live and post- mortem processes. As a way to fix this, I'm creating a common API for accessing thread data, which is able to figure out how to handle the postmortem and live cases on behalf of the caller. As a result of that, I was able to eliminate the two decoders and unify them into a simpler one. Not only that, our TraceSave functionality only worked for live threads, but now it can also work for postmortem processes, which might be useful now, but it might in the future. This common API is OnThreadBinaryDataRead. More information in the inline documentation. Differential Revision: https://reviews.llvm.org/D123281	2022-04-07 15:58:44 -07:00
Alisamar Husain	d849959071	[lldb][intelpt] Remove `IntelPTInstruction` and move methods to `DecodedThread` This is to reduce the size of the trace further and has appreciable results. Differential Revision: https://reviews.llvm.org/D122991	2022-04-05 22:01:36 +05:30
Walter Erquinigo	1e5083a563	[trace][intel pt] Handle better tsc in the decoder A problem that I introduced in the decoder is that I was considering TSC decoding errors as actual instruction errors, which mean that the trace has a gap. This is wrong because a TSC decoding error doesn't mean that there's a gap in the trace. Instead, now I'm just counting how many of these errors happened and I'm using the `dump info` command to check for this number. Besides that, I refactored the decoder a little bit to make it simpler, more readable, and to handle TSCs in a cleaner way. Differential Revision: https://reviews.llvm.org/D122867	2022-04-02 11:06:26 -07:00
Alisamar Husain	ca922a3559	[intelpt] Refactor timestamps out of `IntelPTInstruction` Storing timestamps (TSCs) in a more efficient map at the decoded thread level to speed up TSC lookup, as well as reduce the amount of memory used by each decoded instruction. Also introduced TSC range which keeps the current timestamp valid for all subsequent instructions until the next timestamp is emitted. Differential Revision: https://reviews.llvm.org/D122603	2022-04-01 21:51:42 +05:30
Alisamar Husain	bcf1978a87	[intelpt] Refactoring instruction decoding for flexibility Now the decoded thread has Append methods that provide more flexibility in terms of the underlying data structure that represents the instructions. In this case, we are able to represent the sporadic errors as map and thus reduce the size of each instruction. Differential Revision: https://reviews.llvm.org/D122293	2022-03-26 11:34:47 -07:00
Alisamar Husain	37a466dd72	[trace][intelpt] Added total memory usage by decoded trace This fails currently but the basics are there Differential Revision: https://reviews.llvm.org/D122093	2022-03-21 12:36:08 +05:30
Alisamar Husain	8271220a99	[trace][intelpt] Instruction count in trace info Added a line to `thread trace dump info` results which shows total number of instructions executed until now. Differential Revision: https://reviews.llvm.org/D122076	2022-03-20 11:28:16 +05:30
Pavel Labath	a394231819	[lldb] Remove ConstString from SymbolVendor, Trace, TraceExporter, UnwindAssembly, MemoryHistory and InstrumentationRuntime plugin names	2021-10-29 12:08:57 +02:00
Pavel Labath	a3939e159f	[lldb] Return StringRef from PluginInterface::GetPluginName There is no reason why this function should be returning a ConstString. While modifying these files, I also fixed several instances where GetPluginName and GetPluginNameStatic were returning different strings. I am not changing the return type of GetPluginNameStatic in this patch, as that would necessitate additional changes, and this patch is big enough as it is. Differential Revision: https://reviews.llvm.org/D111877	2021-10-18 10:14:42 +02:00
Pavel Labath	b03126768a	[lldb] Remove PluginInterface::GetPluginVersion In all these years, we haven't found a use for this function (it has zero callers). Lets just remove the boilerplate. Differential Revision: https://reviews.llvm.org/D109600	2021-09-13 10:29:00 +02:00
Walter Erquinigo	602497d672	[trace] [intel pt] Create a "process trace save" command added new command "process trace save -d <directory>". -it saves a JSON file as <directory>/trace.json, with the main properties of the trace session. -it saves binary Intel-pt trace as <directory>/thread_id.trace; each file saves each thread. -it saves modules to the directory <directory>/modules . -it only works for live process and it only support Intel-pt right now. Example: ``` b main run process trace start n process trace save -d /tmp/mytrace ``` A file named trace.json and xxx.trace should be generated in /tmp/mytrace. To load the trace that was just saved: ``` trace load /tmp/mytrace thread trace dump instructions ``` You should see the instructions of the trace got printed. To run a test: ``` cd ~/llvm-sand/build/Release/fbcode-x86_64/toolchain ninja lldb-dotest ./bin/lldb-dotest -p TestTraceSave ``` Reviewed By: wallace Differential Revision: https://reviews.llvm.org/D107669	2021-08-27 09:34:01 -07:00
Walter Erquinigo	29af527c86	[intel pt] fix builds https://reviews.llvm.org/D105649 broke intel pt builds. Fortunately the fix is super easy.	2021-07-21 14:10:09 -07:00
Walter Erquinigo	345ace026b	[trace] [intel pt] Create a "thread trace dump stats" command When the user types that command 'thread trace dump info' and there's a running Trace session in LLDB, a raw trace in bytes should be printed; the command 'thread trace dump info all' should print the info for all the threads. Original Author: hanbingwang Reviewed By: clayborg, wallace Differential Revision: https://reviews.llvm.org/D105717	2021-07-21 09:50:15 -07:00
Walter Erquinigo	04195843ef	[intel pt] Add TSC timestamps Differential Revision: https://reviews.llvm.org/D106328	2021-07-20 16:29:17 -07:00
Walter Erquinigo	b0aa70761b	[trace][intel pt] Implement the Intel PT cursor D104422 added the interface for TraceCursor, which is the main way to traverse instructions in a trace. This diff implements the corresponding cursor class for Intel PT and deletes the now obsolete code. Besides that, the logic for the "thread trace dump instructions" was adapted to use this cursor (pretty much I ended up moving code from Trace.cpp to TraceCursor.cpp). The command by default traverses the instructions backwards, and if the user passes --forwards, then it's not forwards. More information about that is in the Options.td file. Regarding the Intel PT cursor. All Intel PT cursors for the same thread share the same DecodedThread instance. I'm not yet implementing lazy decoding because we don't need it. That'll be for later. For the time being, the entire thread trace is decoded when the first cursor for that thread is requested. Differential Revision: https://reviews.llvm.org/D105531	2021-07-16 16:47:43 -07:00
Walter Erquinigo	f0d0612476	[NFC][trace] remove dead function The Trace::GetCursorPosition function was never really implemented well and it's being replaced by a more correct TraceCursor object.	2021-06-23 23:18:53 -07:00
Walter Erquinigo	2aa1dd1c66	[trace] Add a TraceCursor class As a follow up of D103588, I'm reinitiating the discussion with a new proposal for traversing instructions in a trace which uses the feedback gotten in that diff. See the embedded documentation in TraceCursor for more information. The idea is to offer an OOP way to traverse instructions exposing a minimal interface that makes no assumptions on: - the number of instructions in the trace (i.e. having indices for instructions might be impractical for gigantic intel-pt traces, as it would require to decode the entire trace). This renders the use of indices to point to instructions impractical. Traces are big and expensive, and the consumer should try to do look linear lookups (forwards and/or backwards) and avoid random accesses (the API could be extended though, but for now I want to dicard that funcionality and leave the API extensible if needed). - the way the instructions are represented internally by each Trace plug-in. They could be mmap'ed from a file, exist in plain vector or generated on the fly as the user requests the data. - the actual data structure used internally for each plug-in. Ideas like having a struct TraceInstruction have been discarded because that would make the plug-in follow a certain data type, which might be costly. Instead, the user can ask the cursor for each independent property of the instruction it's pointing at. The way to get a cursor is to ask Trace.h for the end or being cursor or a thread's trace. There are some benefits of this approach: - there's little cost to create a cursor, and this allows for lazily decoding a trace as the user requests data. - each trace plug-in could decide how to cache the instructions it generates. For example, if a trace is small, it might decide to keep everything in memory, or if the trace is massive, it might decide to keep around the last thousands of instructions to speed up local searches. - a cursor can outlive a stop point, which makes trace comparison for live processes feasible. An application of this is to compare profiling data of two runs of the same function, which should be doable with intel pt. Differential Revision: https://reviews.llvm.org/D104422	2021-06-23 22:28:01 -07:00
Walter Erquinigo	bf9f21a28b	[trace][intel-pt] Create basic SB API This adds a basic SB API for creating and stopping traces. Note: This doesn't add any APIs for inspecting individual instructions. That'd be a more complicated change and it might be better to enhande the dump functionality to output the data in binary format. I'll leave that for a later diff. This also enhances the existing tests so that they test the same flow using both the command interface and the SB API. I also did some cleanup of legacy code. Differential Revision: https://reviews.llvm.org/D103500	2021-06-17 15:14:47 -07:00
Walter Erquinigo	ade59d5309	[trace] Dedup different source lines when dumping instructions + refactor When dumping the traced instructions in a for loop, like this one 4: for (int a = 0; a < n; a++) 5: do something; there might be multiple LineEntry objects for line 4, but with different address ranges. This was causing the dump command to dump something like this: ``` a.out`main + 11 at main.cpp:4 [1] 0x0000000000400518 movl $0x0, -0x8(%rbp) [2] 0x000000000040051f jmp 0x400529 ; <+28> at main.cpp:4 a.out`main + 28 at main.cpp:4 [3] 0x0000000000400529 cmpl $0x3, -0x8(%rbp) [4] 0x000000000040052d jle 0x400521 ; <+20> at main.cpp:5 ``` which is confusing, as main.cpp:4 appears twice consecutively. This diff fixes that issue by making the line entry comparison strictly about the line, column and file name. Before it was also comparing the address ranges, which we don't need because our output is strictly about what the user sees in the source. Besides, I've noticed that the logic that traverses instructions and calculates symbols and disassemblies had too much coupling, and made my changes harder to implement, so I decided to decouple it. Now there are two methods for iterating over the instruction of a trace. The existing one does it on raw load addresses, but the one provides a SymbolContext and an InstructionSP, and does the calculations efficiently (not as efficient as possible for now though), so the caller doesn't need to care about these details. I think I'll be using that iterator to reconstruct the call stacks. I was able to fix a test with this change. Differential Revision: https://reviews.llvm.org/D100740	2021-05-04 19:40:52 -07:00

1 2

57 Commits