llvm-project

Commit Graph

Author	SHA1	Message	Date
Michal Gorny	f488cbdcd8	[llvm-exegesis/lib] Fix missing linkage to MCParser Otherwise, shared-lib build fails with: lib64/libLLVMExegesis.a(SnippetFile.cpp.o): In function `llvm::exegesis::readSnippets(llvm::exegesis::LLVMState const&, llvm::StringRef)': SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x31f): undefined reference to `llvm::createMCAsmParser(llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo const&, unsigned int)' SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x41c): undefined reference to `llvm::MCAsmParser::setTargetParser(llvm::MCTargetAsmParser&)' collect2: error: ld returned 1 exit status llvm-svn: 373332	2019-10-01 13:02:48 +00:00
Yuanfang Chen	cc382cf727	[NewPM] Port MachineModuleInfo to the new pass manager. Existing clients are converted to use MachineModuleInfoWrapperPass. The new interface is for defining a new pass manager API in CodeGen. Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm Reviewed By: arsenm, fedor.sergeev Differential Revision: https://reviews.llvm.org/D64183 llvm-svn: 373240	2019-09-30 17:54:50 +00:00
Clement Courbet	03a3d29541	[llvm-exegesis][NFC] Move BenchmarkFailure to own file. Summary: And rename to exegesis::Failure, as it's used everytwhere. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68217 llvm-svn: 373209	2019-09-30 13:53:50 +00:00
Clement Courbet	3e13816be2	[llvm-exegesis][NFC] Refactor snippet file reading out of tool main. Summary: Add unit tests. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68212 llvm-svn: 373202	2019-09-30 12:50:25 +00:00
Simon Pilgrim	1a55431a03	Fix MSVC "not all control paths return a value" warning. NFCI. llvm-svn: 373100	2019-09-27 16:56:07 +00:00
Clement Courbet	9431b72ce9	[llvm-exegesis] Add loop mode for repeating the snippet. Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop\|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083	2019-09-27 12:56:24 +00:00
Clement Courbet	8ef97e1aad	[llvm-exegesis] Refactor how forbidden registers are computed. Summary: Right now latency generation can incorrectly select the scratch register as a dependency-carrying register. - Move the logic for preventing register selection from Uops implementation to common SnippetGenerator class. - Aliasing detection now takes a set of forbidden registers just like random register assignment does. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68084 llvm-svn: 373048	2019-09-27 08:04:10 +00:00
Clement Courbet	06f9ce84fe	[llvm-exegesis][NFC] Remove dead code. Summary: `hasAliasingImplicitRegistersThrough()` is no longer used. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68078 llvm-svn: 372968	2019-09-26 11:32:44 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Fangrui Song	d9b948b6eb	Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC F_{None,Text,Append} are kept for compatibility since r334221. llvm-svn: 367800	2019-08-05 05:43:48 +00:00
Clement Courbet	b9274f2694	[llvm-exegesis] Move native target initialization code to a separate file. Summary: This helps building internal tools on top of the library. Reviewers: gchatelet Subscribers: tschuett, llvm-commits, bdb, ondrasej Tags: #llvm Differential Revision: https://reviews.llvm.org/D62239 llvm-svn: 361385	2019-05-22 13:50:16 +00:00
Roman Lebedev	9bac7d8165	[llvm-exegesis] BenchmarkRunner::runConfiguration(): write small snippet to memory It was previously writing this temporary snippet to file, then reading it back, but leaving the tmp file in place. This is both unefficient, and results in huge garbage pileup in /tmp. One would have thought it would have been caught during D60317.. llvm-svn: 360138	2019-05-07 12:28:08 +00:00
Roman Lebedev	724a68f372	[llvm-exegesis] InstructionBenchmark::writeYamlTo(): don't forget to flush() This APPEARS to fix a very infuriating issue of Yaml's being corrupted, partially written, truncated. Or at least i'm not seeing the issue on a new benchmark sweep. The issue is somewhat rare, happens maybe once in 1000 benchmarks. Which means there are up to hundreds of broken benchmarks for a full x86 sweep in a single mode. llvm-svn: 360124	2019-05-07 09:21:13 +00:00
Ali Tamur	7822b46188	Revert "Use llvm::lower_bound. NFC" This reverts commit rL358161. This patch have broken the test: llvm/test/tools/llvm-exegesis/X86/uops-CMOV16rm-noreg.s llvm-svn: 358199	2019-04-11 17:35:20 +00:00
Fangrui Song	71cce580b9	Use llvm::lower_bound. NFC llvm-svn: 358161	2019-04-11 10:25:41 +00:00
Roman Lebedev	fbb823891d	[llvm-exegesis] Fix serialization/deserialization of special NoRegister register (PR41448) Summary: A lot of instructions have this special register. It seems this never really worked, but i finally noticed it only because it happened to break for `CMOV16rm` instruction. We serialized that register as "" (empty string), which is naturally 'ignored' during deserialization, so we re-create a `MCInst` with too few operands. And when we then happened to try to resolve variant sched class for this mis-serialized instruction, and the variant predicate tried to read an operand that was out of bounds since we got less operands, we crashed. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=41448 \| PR41448 ]]. Reviewers: craig.topper, courbet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60517 llvm-svn: 358153	2019-04-11 07:20:50 +00:00
Roman Lebedev	1992e8f38e	[llvm-exegesis] Pacify bots - don't std::move() - prevents copy elision llvm-svn: 358079	2019-04-10 12:47:47 +00:00
Roman Lebedev	41bdeb7b12	[llvm-exegesis] YamlContext: fix some missing spaces/quotes/newlines in error strings llvm-svn: 358077	2019-04-10 12:20:14 +00:00
Roman Lebedev	628f1ae504	[llvm-exegesis] Fix error propagation from yaml writing (from serialization) Investigating https://bugs.llvm.org/show_bug.cgi?id=41448 llvm-svn: 358076	2019-04-10 12:19:57 +00:00
Roman Lebedev	a82235843b	[llvm-exegesis][X86] Randomize CMOVcc/SETcc OPERAND_COND_CODE CondCodes Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60066 llvm-svn: 357898	2019-04-08 10:11:00 +00:00
Roman Lebedev	404bdb1c9e	[llvm-exegesis][X86] Handle CMOVcc/SETcc OPERAND_COND_CODE OperandType Summary: D60041 / D60138 refactoring changed how CMOV/SETcc opcodes are handled. concode is now an immediate, with it's own operand type. This at least allows to not crash on the opcode. However, this still won't generate all the snippets with all the condcode enumerators. D60066 does that. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60057 llvm-svn: 357841	2019-04-06 14:16:26 +00:00
Craig Topper	80aa2290fb	[X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802	2019-04-05 19:28:09 +00:00
Craig Topper	7323c2bf85	[X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801	2019-04-05 19:27:49 +00:00
Craig Topper	e0bfeb5f24	[X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800	2019-04-05 19:27:41 +00:00
Guillaume Chatelet	848df5b509	Add an option do not dump the generated object on disk Reviewers: courbet Subscribers: llvm-commits, bdb Tags: #llvm Differential Revision: https://reviews.llvm.org/D60317 llvm-svn: 357769	2019-04-05 15:18:59 +00:00
Roman Lebedev	4d81e87765	[NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass. Summary: It doesn't need anything from Analysis::SchedClassCluster class, and takes ResolvedSchedClass as param, so this seems rather fitting. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59994 llvm-svn: 357263	2019-03-29 14:58:01 +00:00
Roman Lebedev	1d1330c546	[NFC][llvm-exegesis] Refactor ResolvedSchedClass & friends Summary: `ResolvedSchedClass` will need to be used outside of `Analysis` (before `InstructionBenchmarkClustering` even), therefore promote it into a non-private top-level class, and while there also move all of the functions that are only called by `ResolvedSchedClass` into that same new file. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59993 llvm-svn: 357259	2019-03-29 14:24:27 +00:00
Roman Lebedev	b8fb15d412	[NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch() Summary: The diff looks scary but it really isn't: 1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()` 2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially. 3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid. 3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`. This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix. 3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()` I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 \| PR41275 ]] Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59951 llvm-svn: 357245	2019-03-29 11:36:08 +00:00
Roman Lebedev	c2423fe689	[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will never match the LLVM values, and thus this cluster will always be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 \| PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152	2019-03-28 08:55:01 +00:00
Clement Courbet	52da938cd0	[llvm-exegesis] Allow the target to disable the selection of some registers. Summary: This prevents "Cannot encode high byte register in REX-prefixed instruction" from happening on instructions that require REX encoding when AH & co get selected. On the down side, these 4 registers can no longer be selected automatically, but this avoids having to expose all the X86 encoding complexity. Reviewers: gchatelet Subscribers: tschuett, jdoerfert, llvm-commits, bdb Tags: #llvm Differential Revision: https://reviews.llvm.org/D59821 llvm-svn: 357003	2019-03-26 15:44:57 +00:00
Clement Courbet	0ddf81c43d	[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862	2019-02-26 10:54:45 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Fangrui Song	990061b6d6	Fix file header issues in fuzzers. NFC llvm-svn: 354551	2019-02-21 07:57:14 +00:00
Hans Wennborg	14e15ec18d	Fix the build with gcc/libstdc++ 4.8.2 after r354441 llvm-svn: 354469	2019-02-20 14:50:08 +00:00
Roman Lebedev	69716394f3	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441	2019-02-20 09:14:04 +00:00
Roman Lebedev	bd84b139b0	[llvm-exegesis] Cut run time of analysis mode by another -35% (sic) (YamlContext::getRegNo()) Summary: Together with the previous patch, it's an -90% improvement, or roughly -96% improvement if you look starting with rL347204 ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1483.18 msec task-clock # 0.999 CPUs utilized ( +- 0.10% ) 68 context-switches # 46.085 M/sec ( +- 22.62% ) 0 cpu-migrations # 0.000 K/sec 11641 page-faults # 7850.880 M/sec ( +- 0.62% ) 5943246799 cycles # 4008184.428 GHz ( +- 0.10% ) (83.28%) 442869514 stalled-cycles-frontend # 7.45% frontend cycles idle ( +- 0.41% ) (83.29%) 1443375663 stalled-cycles-backend # 24.29% backend cycles idle ( +- 0.47% ) (33.43%) 7714006752 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.17%) 1977242936 branches # 1333472193.855 M/sec ( +- 0.07% ) (66.79%) 32624220 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.34%) 1.48438 +- 0.00143 seconds time elapsed ( +- 0.10% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs): 963.28 msec task-clock # 0.999 CPUs utilized ( +- 0.37% ) 12 context-switches # 12.695 M/sec ( +- 52.79% ) 0 cpu-migrations # 0.000 K/sec 11599 page-faults # 12046.971 M/sec ( +- 0.59% ) 3860122322 cycles # 4009359.596 GHz ( +- 0.37% ) (83.19%) 380300669 stalled-cycles-frontend # 9.85% frontend cycles idle ( +- 0.34% ) (83.30%) 1071910340 stalled-cycles-backend # 27.77% backend cycles idle ( +- 1.30% ) (33.51%) 4773418224 instructions # 1.24 insn per cycle # 0.22 stalled cycles per insn ( +- 0.15% ) (50.17%) 1106990316 branches # 1149787979.919 M/sec ( +- 0.11% ) (66.80%) 23632231 branch-misses # 2.13% of all branches ( +- 0.18% ) (83.33%) 0.96389 +- 0.00356 seconds time elapsed ( +- 0.37% ) ``` ``` $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-newer.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57658 llvm-svn: 353025	2019-02-04 09:12:25 +00:00
Roman Lebedev	5b94fe9623	[llvm-exegesis] Cut run time of analysis mode by -84% (sic) (YamlContext::getInstrOpcode()) Summary: ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 9465.46 msec task-clock # 1.000 CPUs utilized ( +- 0.05% ) 60 context-switches # 6.363 M/sec ( +- 79.45% ) 0 cpu-migrations # 0.000 K/sec 11364 page-faults # 1200.697 M/sec ( +- 0.60% ) 37935623543 cycles # 4008083.912 GHz ( +- 0.05% ) (83.32%) 2371625356 stalled-cycles-frontend # 6.25% frontend cycles idle ( +- 0.37% ) (83.32%) 8476077875 stalled-cycles-backend # 22.34% backend cycles idle ( +- 0.18% ) (33.36%) 41822439158 instructions # 1.10 insn per cycle # 0.20 stalled cycles per insn ( +- 0.02% ) (50.03%) 11607658944 branches # 1226405861.486 M/sec ( +- 0.01% ) (66.69%) 210864633 branch-misses # 1.82% of all branches ( +- 0.06% ) (83.34%) 9.46636 +- 0.00441 seconds time elapsed ( +- 0.05% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1480.66 msec task-clock # 1.000 CPUs utilized ( +- 0.19% ) 13 context-switches # 8.483 M/sec ( +- 83.10% ) 0 cpu-migrations # 0.075 M/sec ( +-100.00% ) 11596 page-faults # 7834.247 M/sec ( +- 0.59% ) 5933732194 cycles # 4008977.535 GHz ( +- 0.19% ) (83.22%) 438111928 stalled-cycles-frontend # 7.38% frontend cycles idle ( +- 0.37% ) (83.25%) 1454969705 stalled-cycles-backend # 24.52% backend cycles idle ( +- 0.94% ) (33.53%) 7724218604 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.14%) 1979796413 branches # 1337599858.945 M/sec ( +- 0.06% ) (66.74%) 32641638 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.31%) 1.48128 +- 0.00284 seconds time elapsed ( +- 0.19% ) $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D57657 llvm-svn: 353024	2019-02-04 09:12:21 +00:00
Roman Lebedev	1a0d595f15	[llvm-exegesis] Throughput support in analysis mode Summary: D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 \| PR37698 ]] added support for measuring of the inverse throughput. But the support for the analysis was not added. This attempts to fix that. (analysis done o bdver2 / piledriver) First, small-scale experiment: ``` $ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ... ``` If we plug `bsfq %r12, %r10` into llvm-mca: https://godbolt.org/z/ZtOyhJ ``` Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0 ``` So RThroughput mismatch exists. Now, let's upscale and analyse: {F8207148} `$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`: {F8207172} {F8207197} And if we now look at https://www.agner.org/optimize/instruction_tables.pdf, `Reciprocal throughput` for `BSF r,r` is listed as `3`. Yay? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57647 llvm-svn: 353023	2019-02-04 09:12:17 +00:00
Roman Lebedev	dc78bc277d	[llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16 Summary: ... from 8. `VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components. It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here.. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57654 llvm-svn: 353022	2019-02-04 09:12:13 +00:00
Clement Courbet	362653f7af	[llvm-exegesis] Add throughput mode. Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632	2019-01-30 16:02:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Clement Courbet	176388c973	Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times" Let's discuss this on the review thread before submitting. llvm-svn: 350207	2019-01-02 09:21:00 +00:00
Fangrui Song	cd93d7ef43	[llvm-exegesis] Clustering: don't enqueue a point multiple times Summary: SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead. Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body. We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check. Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process. Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong. llvm-svn: 350035	2018-12-23 20:48:52 +00:00
Simon Pilgrim	96408bb04a	Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 ........ Patch wasn't approved and breaks buildbots llvm-svn: 349139	2018-12-14 09:25:08 +00:00
Fangrui Song	92537ccc7e	[llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 llvm-svn: 349136	2018-12-14 08:27:35 +00:00
Jinsong Ji	56c74cff70	[llvm-exegesis][NFC] Some code style cleanup Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically: 1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces 2. Add missing header to some files 3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode 4. Fix typo Differential Revision: https://reviews.llvm.org/D54343 llvm-svn: 347309	2018-11-20 14:41:59 +00:00
Clement Courbet	bbab546a71	[llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands(). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54304 llvm-svn: 347209	2018-11-19 14:31:43 +00:00
Roman Lebedev	71fdb57640	[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54415 llvm-svn: 347204	2018-11-19 13:28:41 +00:00
Roman Lebedev	8e315b66c2	[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54393 llvm-svn: 347203	2018-11-19 13:28:36 +00:00
Roman Lebedev	666d855fbb	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` very often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% sic) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54390 llvm-svn: 347202	2018-11-19 13:28:31 +00:00
Roman Lebedev	5c5b1ea725	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54389 llvm-svn: 347201	2018-11-19 13:28:26 +00:00
Roman Lebedev	8aecb0c489	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` doubles it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54388 llvm-svn: 347200	2018-11-19 13:28:22 +00:00
Roman Lebedev	b311c1d6b8	[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199	2018-11-19 13:28:17 +00:00
Roman Lebedev	f8b28e9bf4	[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198	2018-11-19 13:28:14 +00:00
Roman Lebedev	0b4b512826	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 million allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54381 llvm-svn: 347197	2018-11-19 13:28:09 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Jinsong Ji	5fd3e75478	[PowerPC][llvm-exegesis] Add a PowerPC target This is patch to add PowerPC target to llvm-exegesis. The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Differential Revision: https://reviews.llvm.org/D54185 llvm-svn: 346411	2018-11-08 16:51:42 +00:00
Clement Courbet	54c2fa1202	[llvm-exegesis][NFC] Add missing header guard + cosmetics. Reviewers: gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54252 llvm-svn: 346400	2018-11-08 12:37:56 +00:00
Clement Courbet	0d79aaf1a7	Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes." This reverts accidental commit rL346394. llvm-svn: 346398	2018-11-08 12:09:45 +00:00
Clement Courbet	c0950ae990	[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes. llvm-svn: 346394	2018-11-08 11:45:14 +00:00
Clement Courbet	5b0d783078	[llvm-exegesis] Remove superfluous move. /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] return std::move(Error); ^ /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: note: remove std::move call here return std::move(Error); ^~~~~~~~~~ ~ llvm-svn: 346333	2018-11-07 16:52:50 +00:00
Clement Courbet	c544838f87	[llvm-exegesis] Correclty handle all X86 memory encoding formats. Summary: Add unit tests to check the support for each supported format to avoid regressions such as the one in PR36906. Reviewers: gchatelet Subscribers: tschuett, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54144 llvm-svn: 346330	2018-11-07 16:14:55 +00:00
Clement Courbet	7066769223	[llvm-exegesis] Increasing wrapping limit. Summary: Fixes PR39097. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54151 llvm-svn: 346328	2018-11-07 15:46:45 +00:00
Clement Courbet	003e08ff28	[llvm-exegesis] Ignore X86 pseudo instructions. Summary: They do not lower to actual MCInsts and have no scheduling info. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54147 llvm-svn: 346227	2018-11-06 14:11:58 +00:00
Matthias Braun	3d849f67cb	MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC MachineModuleInfo can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. llvm-svn: 346182	2018-11-05 23:49:13 +00:00
Clement Courbet	4d837fce88	[llvm-exegesis] Fix SNB counter definition and handling. Summary: SNB is the only one that has P23 as a single proc res. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53766 llvm-svn: 345480	2018-10-28 19:09:14 +00:00
Simon Pilgrim	2a9c728088	Fix MSVC llvm-exegesis build. NFCI. MSVC is a bit funny about is_pod..... llvm-svn: 345252	2018-10-25 10:45:38 +00:00
Clement Courbet	b4b6ec01c6	[llvm-exegesis] Add missing initializer. This is a better fix than rL345245. llvm-svn: 345246	2018-10-25 08:11:35 +00:00
Clement Courbet	fa99b36e4d	[llvm-exegesis] Fix VC build of r345243. "const members cannot be default initialized unless their type has a user defined default constructor" Make members non-const. llvm-svn: 345245	2018-10-25 08:08:58 +00:00
Clement Courbet	8902c885d6	[llvm-exegesis] Fix warning in r345243. warning C4099: 'llvm::exegesis::PfmCountersInfo': type name first seen using 'class' now seen using 'struct' llvm-svn: 345244	2018-10-25 08:06:35 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Guillaume Chatelet	da11b85606	[llvm-exegesis] Implements a cache of Instruction objects. llvm-svn: 345130	2018-10-24 11:55:06 +00:00
Fangrui Song	a342834b24	[llvm-exegesis] Fix name lookup ambiguity in MSVC after 344922 llvm-svn: 344927	2018-10-22 17:52:31 +00:00
Fangrui Song	32401afd8c	[llvm-exegesis] Move namespace exegesis inside llvm:: Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922	2018-10-22 17:10:47 +00:00
Guillaume Chatelet	18ef4a4a0d	[llvm-exegesis] Crash when assembling invalid Operand llvm-svn: 344907	2018-10-22 15:06:10 +00:00
Guillaume Chatelet	02f70a3fde	[llvm-exegesis] Mark x86 segment register instructions as unsupported. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53499 llvm-svn: 344906	2018-10-22 14:55:43 +00:00
Guillaume Chatelet	3c639f33b4	[llvm-exegesis] Reject x86 instructions that use non uniform memory accesses Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53438 llvm-svn: 344905	2018-10-22 14:46:08 +00:00
Clement Courbet	8d0dd0ba0e	[llvm-exegesis] Mark second-form X87 instructions as unsupported. Summary: We only support the first form because we rely on information that is only available there. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53430 llvm-svn: 344782	2018-10-19 12:24:49 +00:00
Clement Courbet	22bad0497e	[llvm-exegesis] Re-enable liveliness tracker. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53429 llvm-svn: 344780	2018-10-19 12:08:05 +00:00
Clement Courbet	c51f45239d	[llvm-exegesis] X87 RFP setup code. Summary: This was lost during refactoring in rL342644. Fix and simplify simplify value size handling: always go through a 80 bit value, because the value can be 1 byte). Add unit tests. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53423 llvm-svn: 344779	2018-10-19 09:56:54 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Krasimir Georgiev	11bc3a18e2	[llvm-exegesis] Mark destructor virtual after r344695 This was causing a -Wnon-virtual-dtor warning. llvm-svn: 344721	2018-10-18 02:06:16 +00:00
Clement Courbet	f973c2df9d	[llvm-exegesis] Allow measuring several instructions in a single run. Summary: We try to recover gracefully on instructions that would crash the program. This includes some refactoring of runMeasurement() implementations. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53371 llvm-svn: 344695	2018-10-17 15:04:15 +00:00
Guillaume Chatelet	6f4bc17309	Fix uninitialized variable llvm-svn: 344692	2018-10-17 12:27:46 +00:00
Guillaume Chatelet	952b121a9c	BuildBot fix, compiler complains about array decay to pointer llvm-svn: 344690	2018-10-17 12:09:21 +00:00
Guillaume Chatelet	fcbb6f3c2b	[llvm-exegeis] Computing Latency configuration upfront so we can generate many CodeTemplates at once. Summary: LatencyGenerator now computes all possible mode of serial execution for an Instruction upfront and generates CodeTemplate for the ones that give the best results (e.g. no need to generate a two instructions snippet when repeating a single one would do). The next step is to generate even more configurations for cases (e.g. for XOR we should generate "XOR EAX, EAX, EAX" and "XOR EAX, EAX, EBX") Reviewers: courbet Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53320 llvm-svn: 344689	2018-10-17 11:37:28 +00:00
Guillaume Chatelet	a3849490b1	[llvm-exegesis] Fix missing std::move. llvm-svn: 344496	2018-10-15 09:21:21 +00:00
Guillaume Chatelet	296a862cbe	[llvm-exegesis][NFC] Return many CodeTemplates instead of one. Summary: This is part one of the change where I simply changed the signature of the functions. More work need to be done to actually produce more than one CodeTemplate per instruction. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53209 llvm-svn: 344493	2018-10-15 09:09:19 +00:00
Guillaume Chatelet	946fb0517a	[llvm-exegesis][NFC] Simplify code at the cost of small code duplication Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53198 llvm-svn: 344351	2018-10-12 15:12:22 +00:00
Guillaume Chatelet	f33b60258e	[llvm-exegesis] Fix always true assert llvm-svn: 344151	2018-10-10 16:16:43 +00:00
Guillaume Chatelet	9b59238822	[llvm-exegesis][NFC] Pass Instruction instead of bare Opcode llvm-svn: 344145	2018-10-10 14:57:32 +00:00
Guillaume Chatelet	ee9c2a17b8	[llvm-exegesis][NFC] Code simplification Summary: Simplify code by having LLVMState hold the RegisterAliasingTrackerCache. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53078 llvm-svn: 344143	2018-10-10 14:22:48 +00:00
John Brawn	c616a7236c	[llvm-exegesis] Fix function return generation so it doesn't return register 0 When fillMachineFunction generates a return on targets without a return opcode (such as AArch64) it should pass an empty set of registers as the return registers, not 0 which means register number zero. Differential Revision: https://reviews.llvm.org/D53074 llvm-svn: 344139	2018-10-10 13:03:23 +00:00
Guillaume Chatelet	d227754973	[llvm-exegesis] Fix broken build. llvm-svn: 344131	2018-10-10 10:09:42 +00:00
Guillaume Chatelet	ffc3ffac7d	[llvm-exegesis][NFC] Simplify code now that Instruction has more semantic Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53065 llvm-svn: 344130	2018-10-10 09:45:17 +00:00
Guillaume Chatelet	0c17cbf790	[llvm-exegesis] Remove unused variable, add more semantic to Instruction. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53062 llvm-svn: 344127	2018-10-10 09:12:36 +00:00
Guillaume Chatelet	22cccffa06	Fix function case. llvm-svn: 344051	2018-10-09 14:51:33 +00:00
Guillaume Chatelet	547d2dd1dd	[llvm-exegesis] Fix invalid return type and add a Dump function. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53020 llvm-svn: 344050	2018-10-09 14:51:29 +00:00
Guillaume Chatelet	fe5b0b488e	[llvm-exegesis] Fix wrong index type. llvm-svn: 344032	2018-10-09 10:06:19 +00:00
Guillaume Chatelet	cf6d5fab32	[llvm-exegesis] Fix unused lambda capture. llvm-svn: 344029	2018-10-09 09:33:29 +00:00
Guillaume Chatelet	09c2839c02	[llvm-exegesis][NFC] Use accessors for Operand. Summary: This moves checking logic into the accessors and makes the structure smaller. It will also help when/if Operand are generated from the TD files. Subscribers: tschuett, courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D52982 llvm-svn: 344028	2018-10-09 08:59:10 +00:00
Guillaume Chatelet	9157bc914f	[llvm-exegesis][NFC] Improve parsing of the YAML files Summary: sscanf turns out to be slow for reading floating points. Reviewers: courbet Subscribers: tschuett, llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D52866 llvm-svn: 343771	2018-10-04 12:33:46 +00:00
Simon Pilgrim	92d02027c2	[llvm-exegesis] Avoid yaml parser from calling sscanf for obvious non-matches (PR39102) deserializeMCOperand - ensure that we at least match the first character of the sscanf pattern before calling This reduces llvm-exegesis uops analysis of the instructions supported from btver2 from 5m13s to 2m1s on debug builds. llvm-svn: 343690	2018-10-03 14:51:09 +00:00
Clement Courbet	5a768ddd44	[llvm-exegesis][NFC] Revert rL343682 "Fix unused variable warning". That was not the proper fix: the variable is used in debug mode. llvm-svn: 343685	2018-10-03 12:48:50 +00:00
Clement Courbet	8a5a6be47a	[llvm-exegesis] Fix rL343680 in release mode. llvm-svn: 343684	2018-10-03 12:35:35 +00:00
Clement Courbet	af50a5b85f	[llvm-exegesis][NFC] Fix unused variable warning. llvm-svn: 343682	2018-10-03 12:27:43 +00:00
Clement Courbet	d5a39553ff	[llvm-exegesis] Resolve variant classes in analysis. Summary: See PR38884. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52825 llvm-svn: 343680	2018-10-03 11:50:25 +00:00
Guillaume Chatelet	415b2fbef5	[llvm-exegesis][NFC] Move random functions from CodeTemplate to SnippetGenerator. Summary: Just moving methods around. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52720 llvm-svn: 343461	2018-10-01 12:19:10 +00:00
Guillaume Chatelet	c6268f3ba2	[llvm-exegesis][NFC] Make randomizeUnsetVariables a free function. Summary: This is prelimineary to moving random functions to SnippetGenerator. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52718 llvm-svn: 343456	2018-10-01 11:46:06 +00:00
Clement Courbet	30183093ab	[llvm-exegesis] Fix PR39096. Summary: The key is now the resource name, not the resource id. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52607 llvm-svn: 343208	2018-09-27 13:26:37 +00:00
Guillaume Chatelet	70ac019efa	[llvm-exegesis][NFC] moving code around. Summary: Renaming InstructionBuilder into InstructionTemplate and moving code generation tools from MCInstrDescView to CodeTemplate. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52592 llvm-svn: 343188	2018-09-27 09:23:04 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Clement Courbet	28d4f85824	[llvm-exegesis] Get rid of debug_string. Summary: THis is a backwards-compatible change (existing files will work as expected). See PR39082. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52546 llvm-svn: 343108	2018-09-26 13:35:10 +00:00
Guillaume Chatelet	7f8d310b76	[llvm-exegesis][NFC] Move CodeTemplate to it's own file. Summary: This is is preparation of exploring value ranges. Reviewers: courbet Reviewed By: courbet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52542 llvm-svn: 343098	2018-09-26 11:57:24 +00:00
Clement Courbet	596c56ff9c	[llvm-exegesis] Add support for measuring NumMicroOps. Summary: Example output for vzeroall: --- mode: uops key: instructions: - 'VZEROALL' config: '' register_initial_values: cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006, key: '3' } - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011, key: '4' } - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004, key: '5' } - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018, key: '6' } - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002, key: '7' } - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019, key: '8' } - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033, key: '9' } - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001, key: '10' } - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069, key: NumMicroOps } error: '' info: '' assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3 ... Reviewers: gchatelet Subscribers: tschuett, RKSimon, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D52539 llvm-svn: 343094	2018-09-26 11:22:56 +00:00
Clement Courbet	684a5f6753	[llvm-exegesis] Output the unscaled value as well as the scaled one. Summary: See PR38936 for context. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52500 llvm-svn: 343081	2018-09-26 08:37:21 +00:00
Guillaume Chatelet	345fae5d56	[llvm-exegesis] Serializes registers initial values. Summary: Adds the registers initial values to the YAML output of llvm-exegesis. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52460 llvm-svn: 342982	2018-09-25 15:15:54 +00:00
Guillaume Chatelet	6078f82241	[llvm-exegesis] Fix missing document separator in YAML output. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52496 llvm-svn: 342981	2018-09-25 14:48:24 +00:00
Clement Courbet	86baebc5fd	[llvm-exegesis] Add lit tests (v2). Summary: This revisits rL342953 by adding detection of host support. Reviewers: gchatelet, lebedev.ri, alexshap Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52464 llvm-svn: 342975	2018-09-25 13:59:35 +00:00
Guillaume Chatelet	55ad087a4c	[llvm-exegesis][NFC] Rewrite of the YAML serialization. Summary: This is a NFC in preparation of exporting the initial registers as part of the YAML dump Reviewers: courbet Reviewed By: courbet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52427 llvm-svn: 342967	2018-09-25 12:18:08 +00:00
Clement Courbet	78b2e73d15	[llvm-exegesis] Allow benchmarking arbitrary code snippets. Summary: This is a step towards fixing PR38048. Note that right now the measurements are given per instruction. We'll need to give measurements a per code snippet and update the analysis (PR38731). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52041 llvm-svn: 342947	2018-09-25 07:31:44 +00:00
Clement Courbet	1e8fdbe3c3	[llvm-exegesis] Fix PR39021. Summary: The `set` statements was incorrectly reading the value of the local variable and setting the value of the parent variable. Reviewers: tycho, gchatelet, john.brawn Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52343 llvm-svn: 342865	2018-09-24 08:39:48 +00:00
Guillaume Chatelet	c96a97bac7	[llvm-exegesis] Improve Register Setup (roll forward of D51856). Summary: Added function to set a register to a particular value + tests. Add EFLAGS test, use new setRegTo instead of setRegToConstant. Reviewers: courbet, javed.absar Subscribers: llvm-commits, tschuett, mgorny Differential Revision: https://reviews.llvm.org/D52297 llvm-svn: 342644	2018-09-20 12:22:18 +00:00
Simon Pilgrim	f652ef3d52	Revert rL342465: Added function to set a register to a particular value + tests. rL342465 is breaking the MSVC buildbots. llvm-svn: 342490	2018-09-18 15:38:16 +00:00
Simon Pilgrim	0242689725	Revert rL342466: [llvm-exegesis] Improve Register Setup. rL342465 is breaking the MSVC buildbots, but I need to revert this dependent revision as well. Summary: Added function to set a register to a particular value + tests. Add EFLAGS test, use new setRegTo instead of setRegToConstant. Reviewers: courbet, javed.absar Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D51856 llvm-svn: 342489	2018-09-18 15:35:49 +00:00
Guillaume Chatelet	937f3fedec	[llvm-exegesis] Improve Register Setup. Summary: Added function to set a register to a particular value + tests. Add EFLAGS test, use new setRegTo instead of setRegToConstant. Reviewers: courbet, javed.absar Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D51856 llvm-svn: 342466	2018-09-18 11:26:48 +00:00
Guillaume Chatelet	8721ad98d1	Added function to set a register to a particular value + tests. llvm-svn: 342465	2018-09-18 11:26:35 +00:00
Guillaume Chatelet	5ad2909e52	Improve Register Setup llvm-svn: 342464	2018-09-18 11:26:27 +00:00
Simon Pilgrim	a2fd56c3e4	Fix "not all control paths return a value" MSVC warning. NFCI. llvm-svn: 342394	2018-09-17 13:56:42 +00:00
Guillaume Chatelet	cd488efe7e	[llvm-exegesis] Add predefined floating point values so we can test impact of special values on latency. Summary: This will be useful to generate many configurations and test instruction regimes (NaN, Inf, subnormal, normal). Reviewers: courbet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D51858 llvm-svn: 342369	2018-09-17 11:09:32 +00:00
Nico Weber	b09a8c9bd9	Revert r342148 (and follow-on fix attempts r342154, r342180, r342182, r342193) Many bots buildling with make have been broken for several days, e.g. http://lab.llvm.org:8011/builders/lld-x86_64-darwin13 llvm-svn: 342336	2018-09-15 19:04:27 +00:00
Richard Diamond	f29b36c76d	[cmake] Fix missing DEPENDS. Not sure how I didn't catch this. llvm-svn: 342154	2018-09-13 17:10:44 +00:00
Richard Diamond	f3063baa6e	Renovate CMake files in the `llvm-(cfi-verify\|exegesis\|mca)` tools. llvm-svn: 342148	2018-09-13 16:15:03 +00:00
Clement Courbet	d939f6d013	[llvm-exegesis][NFC] Split BenchmarkRunner class Summary: The snippet-generation part goes to the SnippetGenerator class. This will allow benchmarking arbitrary code (see PR38437). Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D51979 llvm-svn: 342117	2018-09-13 07:40:53 +00:00
Clement Courbet	903667e956	[llvm-exegesis][NFC]Remove dead function parameter llvm-svn: 342035	2018-09-12 09:26:32 +00:00
Simon Pilgrim	fc2931375d	[llvm-exegesis] Ignore double spaced separators in asm strings Some asm has double spaces between operands, the deserializer was keeping these empty split pieces, causing assertions later on: 'ADC16mi RDI i_0x1x i_0x0x i_0x1x' llvm-svn: 341799	2018-09-10 10:45:04 +00:00
Guillaume Chatelet	e60866a4e0	[llvm-exegesis] Renaming classes and functions. Summary: Functional No Op. Reviewers: gchatelet Subscribers: tschuett, courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D50231 llvm-svn: 338836	2018-08-03 09:29:38 +00:00
Guillaume Chatelet	171f3f46c8	[llvm-exegesis] Rename InstructionInstance into InstructionBuilder. Summary: Non functional change. Subscribers: tschuett, courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D50176 llvm-svn: 338701	2018-08-02 11:12:02 +00:00
Guillaume Chatelet	fb94354d2d	[llvm-exegesis] Provide a way to handle memory instructions. Summary: And implement memory instructions on X86. This fixes PR36906. Reviewers: gchatelet Reviewed By: gchatelet Subscribers: lebedev.ri, filcab, mgorny, tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D48935 llvm-svn: 338567	2018-08-01 14:41:45 +00:00
Clement Courbet	f9a0bb330d	[llvm-exegesis] Add uop computation for more X87 instruction classes. Summary: This allows measuring comparisons (UCOM_FpIr32,UCOM_Fpr32,...), conditional moves (CMOVBE_Fp32,...) Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48713 llvm-svn: 336352	2018-07-05 13:54:51 +00:00
Clement Courbet	2c278cdd98	[llvm-exegesis][NFC]clang-format llvm-svn: 336343	2018-07-05 12:26:12 +00:00
Clement Courbet	e945fad250	[llvm-exegesis] Remove dead comment. llvm-svn: 336266	2018-07-04 12:31:00 +00:00
John Brawn	c4ed60042f	[llvm-exegesis] Add an AArch64 target The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Differential Revision: https://reviews.llvm.org/D48780 llvm-svn: 336187	2018-07-03 10:10:29 +00:00
Clement Courbet	e785169fce	[llvm-exegesis] ExegisX86Target::setRegToConstant() should depend on the subtarget features. Summary: This fixes PR38008. Reviewers: gchatelet, RKSimon Subscribers: tschuett, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D48820 llvm-svn: 336171	2018-07-03 06:17:05 +00:00
John Brawn	346856dc6c	[llvm-exegesis] Change how the native architecture is determined Currently the llvm-exegesis native architecture is determined by comparing the llvm native architecture with X86, so to add a new target would mean adding a new check. Change this to building up a list of the targets llvm-exegesis supports then using that, as this means that when adding a new target you just add the target to the list of supported targets. Differential Revision: https://reviews.llvm.org/D48778 llvm-svn: 336105	2018-07-02 13:53:46 +00:00
John Brawn	8fc5ec78d5	[llvm-exegesis] Delegate the decision of cycle counter name to the target Currently the cycle counter is taken from the subtarget schedule model, which isn't any use if the subtarget doesn't have one. Delegate the decision to the target benchmark runner, as it may know better what to do in that case, with the default being the current behaviour. Differential Revision: https://reviews.llvm.org/D48779 llvm-svn: 336099	2018-07-02 13:14:49 +00:00
Clement Courbet	a53349251c	[llvm-exegesis][NFC] Cleanup useless braces. llvm-svn: 336076	2018-07-02 06:39:55 +00:00
Clement Courbet	717c9768d3	[llvm-exegesis] Add partial X87 support. Summary: This enables the X86-specific X86FloatingPointStackifierPass, and allow llvm-exegesis to generate and measure X87 latency/uops for some FP ops. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48592 llvm-svn: 335815	2018-06-28 07:41:16 +00:00
Clement Courbet	650db339a5	[llvm-exegesis][NFC] Fix windows warning in rL335465. llvm-svn: 335591	2018-06-26 10:52:41 +00:00
Clement Courbet	4860b98443	[llvm-exegesis] Get the BenchmarkRunner from the ExegesisTarget. Summary: This allows targets to override code generation for some instructions. As an example of override, this also moves ad-hoc instruction filtering for X86 into the X86 ExegesisTarget. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D48587 llvm-svn: 335582	2018-06-26 08:49:30 +00:00

1 2 3 4 5 ...

328 Commits