llvm-project

Commit Graph

Author	SHA1	Message	Date
Gaurav Gaur	d30fd5c3a1	[trace][intel pt] Add a cgroup filter It turns out that cgroup filtering is relatively trivial and works really nicely. Thid diffs adds automatic cgroup filtering when in per-cpu mode, unless a new --disable-cgroup-filtering flag is passed in the start command. At least on Meta machines, all processes are spawned inside a cgroup by default, which comes super handy, because per cpu tracing is now much more precise. A manual test gave me this result - Without filtering: Total number of trace items: 36083 Total number of continuous executions found: 229 Number of continuous executions for this thread: 2 Total number of PSB blocks found: 98 Number of PSB blocks for this thread 2 Total number of unattributed PSB blocks found: 38 - With filtering: Total number of trace items: 87756 Total number of continuous executions found: 123 Number of continuous executions for this thread: 2 Total number of PSB blocks found: 10 Number of PSB blocks for this thread 3 Total number of unattributed PSB blocks found: 2 Filtering gives us great results. The number of instructions collected more than double (probalby because we have less noise in the trace), and we have much less unattributed PSBs blocks and unrelated PSBs in general. The ones that are unrelated probably belong to other processes in the same cgroup. Differential Revision: https://reviews.llvm.org/D129257	2022-07-13 12:26:11 -07:00
Kazu Hirata	aa88161b37	[lldb] Use value_or instead of getValueOr (NFC)	2022-06-19 09:12:01 -07:00
Walter Erquinigo	9f45f23d86	[trace][intelpt] Support system-wide tracing [21] - Support long numbers in JSON llvm's JSON parser supports 64 bit integers, but other tools like the ones written in JS don't support numbers that big, so we need to represent these possibly big numbers as a string. This diff uses that to represent addresses and tsc zero. The former is printed in hex for and the latter in decimal string form. The schema was updated mentioning that. Besides that, I fixed some remaining issues and now all test pass. Before I wasn't running all tests because for some reason my computer reverted perf_paranoid to 1. Differential Revision: https://reviews.llvm.org/D127819	2022-06-16 11:42:22 -07:00
Walter Erquinigo	6a5355e8a1	[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema - traceBuffer -> iptTrace - core -> cpu Differential Revision: https://reviews.llvm.org/D127817	2022-06-16 11:42:22 -07:00
Walter Erquinigo	a19fcc2bec	[trace][intelpt] Support system-wide tracing [14] - Decode per cpu This is the final functional patch to support intel pt decoding per cpu. It works by doing the following: - First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core. - Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons. - With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded. - Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps. I'm adding a test as well. Differential Revision: https://reviews.llvm.org/D126394	2022-06-16 11:23:01 -07:00
Walter Erquinigo	1a3f996972	[trace][intelpt] Support system-wide tracing [13] - Add context switch decoding - Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed. - The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information. - Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding. - Add the logic to conver nanoseconds to TSCs. - Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server. - Finish the necessary bits for loading and saving a multi-core trace bundle from disk. - Change some size_t to uint64_t for compatibility with 32 bit systems. Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text: ``` (lldb) trace load /tmp/trace3/trace.json (lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 4192415 Total number of instructions: 1 Memory usage: Total approximate memory usage (excluding raw trace): 2.51 KiB Average memory usage per instruction (excluding raw trace): 2573.00 bytes Timing for this thread: Timing for global tasks: Context switch trace decoding: 0.00s Events: Number of instructions with events: 0 Number of individual events: 0 Multi-core decoding: Total number of continuous executions found: 2499 Number of continuous executions for this thread: 102 Errors: Number of TSC decoding errors: 0 ``` Differential Revision: https://reviews.llvm.org/D126267	2022-06-16 11:23:01 -07:00
Walter Erquinigo	fc5ef57c7d	[trace][intelpt] Support system-wide tracing [12] - Support multi-core trace load and save :q! This diff is massive, but it's because it connects the client with lldb-server and also ensures that the postmortem case works. - Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type". -- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse. -- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal. -- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo. - Implemented reading the context switch trace data buffer. I had to do some refactors to do that cleanly. -- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind. - Implemented all the necessary bits to read trace.json files with per-core data. - Implemented all the necessary bits to save to disk per-core trace session. - Added a test that ensures that parsing and saving to disk works. Differential Revision: https://reviews.llvm.org/D126015	2022-06-15 13:28:36 -07:00
Walter Erquinigo	1f2d49a8e7	[trace][intelpt] Support system-wide tracing [10] - Return warnings and tsc information from lldb-server. - Add a warnings field in the jLLDBGetState response, for warnings to be delivered to the client for troubleshooting. This removes the need to silently log lldb-server's llvm::Errors and not expose them easily to the user - Simplify the tscPerfZeroConversion struct and schema. It used to extend a base abstract class, but I'm doubting that we'll ever add other conversion mechanisms because all modern kernels support perf zero. It is also the one who is supposed to work with the timestamps produced by the context switch trace, so expecting it is imperative. - Force tsc collection for cpu tracing. - Add a test checking that tscPerfZeroConversion is returned by the GetState request - Add a pre-check for cpu tracing that makes sure that perf zero values are available. Differential Revision: https://reviews.llvm.org/D125932	2022-06-15 12:08:00 -07:00
Walter Erquinigo	a758205951	[trace][intelpt] Support system-wide tracing [9] - Collect and return context switch traces - Add collection of context switches per cpu grouped with the per-cpu intel pt traces. - Move the state handling from the interl pt trace class to the PerfEvent one. - Add support for stopping and enabling perf event groups. - Return context switch entries as part of the jLLDBTraceGetState response. - Move the triggers of whenever the process stopped or resumed. Now the will-resume notification is in a better location, which will ensure that we'll capture the instructions that will be executed. - Remove IntelPTSingleBufferTraceUP. The unique pointer was useless. - Add unit tests Differential Revision: https://reviews.llvm.org/D125897	2022-06-15 12:07:59 -07:00
Walter Erquinigo	1f49714d3e	[trace][intelpt] Support system-wide tracing [4] - Support per core tracing on lldb-server This diffs implements per-core tracing on lldb-server. It also includes tests that ensure that tracing can be initiated from the client and that the jLLDBGetState ppacket returns the list of trace buffers per core. This doesn't include any decoder changes. Finally, this makes some little changes here and there improving the existing code. A specific piece of code that can't reliably be tested is when tracing per core fails due to permissions. In this case we add a troubleshooting message and this is the manual test: ``` /proc/sys/kernel/perf_event_paranoid set to 1 (lldb) process trace start --per-core-tracing error: perf event syscall failed: Permission denied You might need that /proc/sys/kernel/perf_event_paranoid has a value of 0 or -1. `` Differential Revision: https://reviews.llvm.org/D124858	2022-05-17 12:46:54 -07:00
Walter Erquinigo	26d83a431e	[NFC][lldb][trace] Use uint64_t when decoding and enconding json llvm's json parser supports uint64_t, so let's better use it for the packets being sent between lldb and lldb-server instead of using int64_t as an intermediate type, which might be error-prone.	2022-05-17 11:08:04 -07:00
Walter Erquinigo	285b39a31e	Revert "[NFC][lldb][trace] Use uint64_t when decoding and enconding json" This reverts commit `9d2dd6d762`. Reverting because this exposes an issue in the uint64_t json parser.	2022-05-09 22:47:05 -07:00
Walter Erquinigo	9d2dd6d762	[NFC][lldb][trace] Use uint64_t when decoding and enconding json llvm's json parser supports uint64_t, so let's better use it for the packets being sent between lldb and lldb-server instead of using int64_t as an intermediate type, which might be error-prone.	2022-05-09 21:55:43 -07:00
Walter Erquinigo	7b73de9ec2	[trace][intelpt] Support system-wide tracing [3] - Refactor IntelPTThreadTrace I'm refactoring IntelPTThreadTrace into IntelPTSingleBufferTrace so that it can both single threads or single cores. In this diff I'm basically renaming the class, moving it to its own file, and removing all the pieces that are not used along with some basic cleanup. Differential Revision: https://reviews.llvm.org/D124648	2022-05-09 16:05:26 -07:00
Walter Erquinigo	b8d1776fc5	[trace][intelpt] Support system-wide tracing [2] - Add a dummy --per-core-tracing option This updates the documentation of the gdb-remote protocol, as well as the help messages, to include the new --per-core-tracing option. Differential Revision: https://reviews.llvm.org/D124640	2022-05-09 16:05:26 -07:00
Walter Erquinigo	5de0a3e9da	[trace][intelpt] Support system-wide tracing [1] - Add a method for accessing the list of logical core ids In order to open perf events per core, we need to first get the list of core ids available in the system. So I'm adding a function that does that by parsing /proc/cpuinfo. That seems to be the simplest and most portable way to do that. Besides that, I made a few refactors and renames to reflect better that the cpu info that we use in lldb-server comes from procfs. Differential Revision: https://reviews.llvm.org/D124573	2022-05-02 08:48:49 -07:00
Jakob Johnson	9b79187c96	[trace][intelpt] Server side changes for TSC to wall time conversion Update the response schema of the TraceGetState packet and add Intel PT specific response structure that contains the TSC conversion, if it exists. The IntelPTCollector loads the TSC conversion and caches it to prevent unnecessary calls to perf_event_open. Move the TSC conversion calculation from Perf.h to TraceIntelPTGDBRemotePackets.h to remove dependency on Linux specific headers. Differential Revision: https://reviews.llvm.org/D122246	2022-03-24 05:36:21 -07:00
Walter Erquinigo	04195843ef	[intel pt] Add TSC timestamps Differential Revision: https://reviews.llvm.org/D106328	2021-07-20 16:29:17 -07:00
Walter Erquinigo	0b69756110	[trace][intel-pt] Implement trace start and trace stop This implements the interactive trace start and stop methods. This diff ended up being much larger than I anticipated because, by doing it, I found that I had implemented in the beginning many things in a non optimal way. In any case, the code is much better now. There's a lot of boilerplate code due to the gdb-remote protocol, but the main changes are: - New tracing packets: jLLDBTraceStop, jLLDBTraceStart, jLLDBTraceGetBinaryData. The gdb-remote packet definitions are quite comprehensive. - Implementation of the "process trace start\|stop" and "thread trace start\|stop" commands. - Implementaiton of an API in Trace.h to interact with live traces. - Created an IntelPTDecoder for live threads, that use the debugger's stop id as checkpoint for its internal cache. - Added a functionality to stop the process in case "process tracing" is enabled and a new thread can't traced. - Added tests I have some ideas to unify the code paths for post mortem and live threads, but I'll do that in another diff. Differential Revision: https://reviews.llvm.org/D91679	2021-03-30 17:31:37 -07:00

19 Commits