llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	e030f808ec	[Exegesis] Native clusterization: sub-partition by sched class id Currently native clusterization simply groups all benchmarks by the opcode of key instruction, but that is suboptimal in certain cases, e.g. where we can already tell that the particular instructions already resolve into different sched classes.	2021-09-07 17:54:37 +03:00
Roman Lebedev	b3b9b297a0	[NFC][exegesis] Add test for the following patch	2021-09-07 17:54:36 +03:00
Simon Pilgrim	056b409ceb	[llvm-exegesis][x86] Limit llvm-exegesis analysis tests to x86_64 triple hosts Attempting to fix an issue with test failures on arm m1 apple macintoshes reported on D109353	2021-09-07 14:35:52 +01:00
Simon Pilgrim	6a9e2764f6	[llvm-exegesis] Analysis tests should run even without libpfm (PR51687) Move inverse_throughput, latency and uops to sub-directories (like we already do for lbr), which require libpfm, so we can relax the lit limits for analysis tests in the x86 root directory. Differential Revision: https://reviews.llvm.org/D109353	2021-09-07 13:58:05 +01:00
Roman Lebedev	d094f3c3c5	[llvm-exegesis] SnippetFile: do create source manager in MCContext This way, once there's an error in the snippet file (like in the test), llvm-exegesis won't crash with an assertion failure, but print a nice diagnostic about the problem.	2021-04-04 15:58:39 +03:00
Roman Lebedev	64a52e1e32	[llvm-exegesis] Don't erroneously refuse to measure POPCNT instruction	2021-04-04 14:38:26 +03:00
David Zarzycki	0fda5e8441	[llvm-exegesis testing] Workaround unreliable test Picking an instruction at random is not perfectly reliable.	2021-03-16 08:00:14 -04:00
Qiu Chaofan	9d2f06445f	[llvm-exegesis] Ignore instructions using custom inserter Some instructions defined in table-gen files sets usesCustomInserter bit, which means it has to be lowered by target code and isn't actually valid instruction at MC level. So we should treat them like pseudo instructions. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D94898	2021-02-19 17:04:27 +08:00
Qiu Chaofan	d7d4dca15f	[llvm-exegesis] [PowerPC] Add basic LIT test Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D94897	2021-02-19 17:01:05 +08:00
David Zarzycki	1bc8daba4f	Fix x86 exegesis tests after `c042aff886` In `c042aff886`, unused FileCheck prefixes became an error, which exposed some testing bugs in four exegesis tests. I've tried my best to either fix the testing bugs, or expand the testing to cover more scenarios. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D95287	2021-01-24 08:51:06 -05:00
Clement Courbet	af658d920e	[llvm-exegesis][X86] Save and restore eflags. This is needed to benchmark instruction that touch EFLAGS (e.g. STD: set direction flag). Differential Revision: https://reviews.llvm.org/D90742	2020-11-04 10:44:15 +01:00
Clement Courbet	8383fddc4f	Re-land "[llvm-exegesis] Save target state before running the benchmark." The X86 exegesis target is never executed run on non-X86 hosts, disable X86 instrinsic code on non-X86 targets. This reverts commit `8cfc872129`.	2020-11-04 09:46:55 +01:00
Clement Courbet	8cfc872129	Revert "Re-land "[llvm-exegesis] Save target state before running the benchmark." Still issues on some architectures. This reverts commit `fd13d7ce09`.	2020-11-04 08:48:44 +01:00
Clement Courbet	fd13d7ce09	Re-land "[llvm-exegesis] Save target state before running the benchmark. Use `__builtin_ia32_fxsave64` under __GNUC__, (_fxsave64) does not exist in old versions of gcc (pre-9.1). This reverts commit `e128f9cafc`.	2020-11-04 08:34:33 +01:00
Clement Courbet	e128f9cafc	Revert "[llvm-exegesis] Save target state before running the benchmark." _fxsave64 is not available on some buildbots. This reverts commit `274de447fe`.	2020-11-02 15:11:45 +01:00
Clement Courbet	274de447fe	[llvm-exegesis] Save target state before running the benchmark. Some benchmarked instructions might set target state. Preserve this state. See PR26418. Differential Revision: https://reviews.llvm.org/D90592	2020-11-02 15:02:54 +01:00
Clement Courbet	24bf8faabd	[llvm-exegesis] Do not try to assign random registers twice. Doing a random assignment assigns both tested (forward) and back-to-back (backward) instructions. When none of the tested instruction and back-to-back instruction have implicit aliasing, we're currently trying to do a random register asignment twice. Fix this (see PR26418). Differential Revision: https://reviews.llvm.org/D90380	2020-10-29 13:27:35 +01:00
Vy Nguyen	cb3fd715f3	Reland rG4fcd1a8e6528:[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking. This is mostly for the benefit of the LBR latency mode. Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency. Differential Revision: https://reviews.llvm.org/D85254 New change: Updated lit.local.cfg to use pass the right argument to llvm-exegesis to actually request the LBR mode. Differential Revision: https://reviews.llvm.org/D88670	2020-10-01 12:21:16 -04:00
Michael Liao	2c9dc7bbbf	Revert "[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking." This reverts commit `4fcd1a8e65` as `llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s` failed on hosts without LBR supported if the build has LIBPFM enabled. On that host, `perf_event_open` fails with `EOPNOTSUPP` on LBR config. That change's basic assumption > If this is run on a non-supported hardware, it will produce all zeroes for latency. could not stand as `perf_event_open` system call will fail if the underlying hardware really don't have LBR supported.	2020-09-30 23:15:35 -04:00
Vy Nguyen	4fcd1a8e65	[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking. This is mostly for the benefit of the LBR latency mode. Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency. Differential Revision: https://reviews.llvm.org/D85254	2020-09-30 12:25:59 -04:00
Vy Nguyen	ee7caa7593	Reland [llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements. Starting with Skylake, the LBR contains the precise number of cycles between the two consecutive branches. Making use of this will hopefully make the measurements more precise than the existing methods of using RDTSC. Differential Revision: https://reviews.llvm.org/D77422 New change: check for existence of field `cycles` in perf_branch_entry before enabling this mode. This should prevent compilation errors when building for older kernel whose headers don't support it.	2020-07-27 12:38:05 -04:00
Clement Courbet	6bddd099ac	Revert "[llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements." From @erichkeane: ``` This patch doesn't seem to build for me: /iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp: In function ‘llvm::Error llvm::exegesis::parseDataBuffer(const char, size_t, const void, const void, llvm::SmallVector<long int, 4>)’: /iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:99:37: error: ‘struct perf_branch_entry’ has no member named ‘cycles’ CycleArray->push_back(Entry.cycles); I'm on RHEL7, so I have kernel 3.10, so it doesn't have 'cycles'. According ot this: https://elixir.bootlin.com/linux/v4.3/source/include/uapi/linux/perf_event.h#L963 kernel 4.3 is the first time that 'cycles' appeared in this structure. ```	2020-07-17 16:55:17 +02:00
Vy Nguyen	1360e140cc	[llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements. Starting with Skylake, the LBR contains the precise number of cycles between the two consecutive branches. Making use of this will hopefully make the measurements more precise than the existing methods of using RDTSC. Differential Revision: https://reviews.llvm.org/D77422	2020-07-16 12:12:46 -04:00
Roman Lebedev	6030fe01f4	[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE Summary: Currently, we only have nice exploration for LEA instruction, while for the rest, we rely on `randomizeUnsetVariables()` to sometimes generate something interesting. While that works, it isn't very reliable in coverage :) Here, i'm making an assumption that while we may want to explore multi-instruction configs, we are most interested in the characteristics of the main instruction we were asked about. Which we can do, by taking the existing `randomizeMCOperand()`, and turning it on it's head - instead of relying on it to randomly fill one of the interesting values, let's pregenerate all the possible interesting values for the variable, and then generate as much `InstructionTemplate` combinations of these possible values for variables as needed/possible. Of course, that requires invasive changes to no longer pass just the naked `Instruction`, but sometimes partially filled `InstructionTemplate`. As it can be seen from the test, this allows us to explore `X86::OperandType::OPERAND_COND_CODE` for instructions that take such an operand. I'm hoping this will greatly simplify exploration. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74156	2020-02-12 21:33:52 +03:00
Clement Courbet	87632b9e06	[llvm-exegesis] Fix support for LEA64_32r. Summary: Add unit test to show the issue: We must select an aliasing output register, not the exact register. Reviewers: gchatelet Subscribers: tschuett, mstojanovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73095	2020-01-21 13:58:23 +01:00
Miloš Stojanović	804dd67227	[llvm-exegesis][mips] Expand loadImmediate() Add support for loading 32-bit immediates and enable the use of GPR64 registers. Differential Revision: https://reviews.llvm.org/D71873	2020-01-13 12:32:13 +01:00
Miloš Stojanović	862a602416	[llvm-exegesis][mips] Add lit test Adding a basic lit test for MIPS. Differential Revision: https://reviews.llvm.org/D71605	2019-12-18 10:21:06 +01:00
Wang, Pengfei	cf81714a7e	[X86] Model MXCSR for AVX instructions other than AVX512 Summary: Model MXCSR for AVX instructions other than AVX512 Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70875	2019-12-03 08:53:47 +08:00
Clement Courbet	3540b80fe4	[llvm-exegesis] Fix `44b9942898`. Summary: Add missing stack release instructions in loadImplicitRegAndFinalize. Reviewers: pengfei, gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70903	2019-12-02 16:13:27 +01:00
Wang, Pengfei	76b70f6f75	[X86] Add initialization of FPCW in llvm-exegesis Summary: This is a following up to D70874. It adds the initialization of FPCW in llvm-exegesis. Reviewers: craig.topper, RKSimon, courbet, gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70891	2019-12-02 20:18:35 +08:00
Wang, Pengfei	44b9942898	[X86] Add initialization of MXCSR in llvm-exegesis Summary: This patch is used to initialize the new added register MXCSR. Reviewers: craig.topper, RKSimon Subscribers: tschuett, courbet, llvm-commits, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70874	2019-12-02 18:19:32 +08:00
Clement Courbet	c8eb0547ef	[llvm-exegesis] Show noise cluster in analysis output. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68780 llvm-svn: 374533	2019-10-11 11:33:18 +00:00
Clement Courbet	c3a7fb7599	[llvm-exegesis] Explore LEA addressing modes. Summary: This will help for PR32326. This shows the well-known issue with `RBP` and `R13` as base registers. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, RKSimon, andreadb Tags: #llvm Differential Revision: https://reviews.llvm.org/D68646 llvm-svn: 374146	2019-10-09 08:49:13 +00:00
Clement Courbet	2cd0f28959	[llvm-exegesis] Add options to SnippetGenerator. Summary: This adds a `-max-configs-per-opcode` option to limit the number of configs per opcode. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68642 llvm-svn: 374054	2019-10-08 14:30:24 +00:00
Clement Courbet	4919534ae4	[llvm-exegesis] Finish plumbing the `Config` field. Summary: Right now there are no snippet generators that emit the `Config` Field, but I plan to add it to investigate LEA operands for PR32326. What was broken was: - `Config` Was not propagated up until the BenchmarkResult::Key. - Clustering should really consider different configs as measuring different things, so we should stabilize on (Opcode, Config) instead of just Opcode. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D68629 llvm-svn: 374031	2019-10-08 09:06:48 +00:00
Clement Courbet	f1ac8151f9	[llvm-exegesis] Add stabilization test with config In preparation for D68629. llvm-svn: 374020	2019-10-08 07:08:48 +00:00
Clement Courbet	9431b72ce9	[llvm-exegesis] Add loop mode for repeating the snippet. Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop\|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083	2019-09-27 12:56:24 +00:00
Roman Lebedev	fbb823891d	[llvm-exegesis] Fix serialization/deserialization of special NoRegister register (PR41448) Summary: A lot of instructions have this special register. It seems this never really worked, but i finally noticed it only because it happened to break for `CMOV16rm` instruction. We serialized that register as "" (empty string), which is naturally 'ignored' during deserialization, so we re-create a `MCInst` with too few operands. And when we then happened to try to resolve variant sched class for this mis-serialized instruction, and the variant predicate tried to read an operand that was out of bounds since we got less operands, we crashed. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 \| PR41448 ]]. Reviewers: craig.topper, courbet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60517 llvm-svn: 358153	2019-04-11 07:20:50 +00:00
Roman Lebedev	a82235843b	[llvm-exegesis][X86] Randomize CMOVcc/SETcc OPERAND_COND_CODE CondCodes Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60066 llvm-svn: 357898	2019-04-08 10:11:00 +00:00
Roman Lebedev	404bdb1c9e	[llvm-exegesis][X86] Handle CMOVcc/SETcc OPERAND_COND_CODE OperandType Summary: D60041 / D60138 refactoring changed how CMOV/SETcc opcodes are handled. concode is now an immediate, with it's own operand type. This at least allows to not crash on the opcode. However, this still won't generate all the snippets with all the condcode enumerators. D60066 does that. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60057 llvm-svn: 357841	2019-04-06 14:16:26 +00:00
Roman Lebedev	c2423fe689	[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will never match the LLVM values, and thus this cluster will always be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 \| PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152	2019-03-28 08:55:01 +00:00
Clement Courbet	52da938cd0	[llvm-exegesis] Allow the target to disable the selection of some registers. Summary: This prevents "Cannot encode high byte register in REX-prefixed instruction" from happening on instructions that require REX encoding when AH & co get selected. On the down side, these 4 registers can no longer be selected automatically, but this avoids having to expose all the X86 encoding complexity. Reviewers: gchatelet Subscribers: tschuett, jdoerfert, llvm-commits, bdb Tags: #llvm Differential Revision: https://reviews.llvm.org/D59821 llvm-svn: 357003	2019-03-26 15:44:57 +00:00
Clement Courbet	0ddf81c43d	[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862	2019-02-26 10:54:45 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Roman Lebedev	69716394f3	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441	2019-02-20 09:14:04 +00:00
Roman Lebedev	21193f4b7e	[llvm-exegesis] Don't default to running&dumping all analyses to '-' Summary: Up until the point i have looked in the source, i didn't even understood that i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`. (And hoped it wasn't contributing much of the run time.) While i expect that it has it's use-cases i never once needed it so far. If i forget to silence it, console is completely flooded with that output. How about not expecting users to opt-out of analyses, but to explicitly specify the analyses that should be performed? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57648 llvm-svn: 353021	2019-02-04 09:12:08 +00:00
Clement Courbet	362653f7af	[llvm-exegesis] Add throughput mode. Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632	2019-01-30 16:02:20 +00:00
Clement Courbet	7b475f3b41	[llvm-exegesis] Also check latency mode in local lit. Summary: This should avoid failing on old CPUs that do not have a cycle counter. Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D55416 llvm-svn: 348740	2018-12-10 07:29:47 +00:00
Clement Courbet	c544838f87	[llvm-exegesis] Correclty handle all X86 memory encoding formats. Summary: Add unit tests to check the support for each supported format to avoid regressions such as the one in PR36906. Reviewers: gchatelet Subscribers: tschuett, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54144 llvm-svn: 346330	2018-11-07 16:14:55 +00:00
John Brawn	c616a7236c	[llvm-exegesis] Fix function return generation so it doesn't return register 0 When fillMachineFunction generates a return on targets without a return opcode (such as AArch64) it should pass an empty set of registers as the return registers, not 0 which means register number zero. Differential Revision: https://reviews.llvm.org/D53074 llvm-svn: 344139	2018-10-10 13:03:23 +00:00

1 2

61 Commits