llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
Roman Lebedev	78eaff2ef8	[llvm-exegesis] Loop unrolling for loop snippet repetitor mode I really needed this, like, factually, yesterday, when verifying dependency breaking idioms for AMD Zen 3 scheduler model. Consider the following example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 } error: '' info: '' assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3 ... ``` What does it tell us? So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle? That doesn't seem right. That's even less than there are pipes supporting this type of op. Now, second example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` Now that's just worse. Due to the looping, the throughput completely plummeted, and now we can only do a single instruction/cycle!? That's not great. And final example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000 Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x (loop-body-size/instruction count in snippet), and run a loop with 1000 iterations over that duplicated/unrolled snippet, the measured throughput goes through the roof, up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle! Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D102522	2021-05-25 12:08:27 +03:00
Logan Smith	31eb83496f	[llvm][NFC] Add missing 'override's in unittests/	2020-07-17 17:35:59 -07:00
Miloš Stojanović	24b7b99b7d	[llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners The addition of `inverse_throughput` mode highlighted the disjointedness of snippet generators and benchmark runners because it used the `UopsSnippetGenerator` with the `LatencyBenchmarkRunner`. To keep the code consistent tie the snippet generators to parallelization/serialization rather than their benchmark runners. Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`. Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`. Differential Revision: https://reviews.llvm.org/D72928	2020-01-20 16:19:13 +01:00
Clement Courbet	8109901bf6	[llvm-exegesis][NFC] Refactor X86 tests fixtures into a base class. Reviewers: gchatelet, a.sidorin Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68262 llvm-svn: 373313	2019-10-01 09:20:36 +00:00
Clement Courbet	24078fe157	[llvm-exegesis] Fix r373083: Module -> Mod. SnippetRepetitorTest.cpp:66:27: error: declaration of ‘std::unique_ptr<llvm::Module> llvm::exegesis::{anonymous}::X86SnippetRepetitorTest::Module’ [-fpermissive] std::unique_ptr<Module> Module; llvm-svn: 373087	2019-09-27 13:21:37 +00:00
Clement Courbet	9431b72ce9	[llvm-exegesis] Add loop mode for repeating the snippet. Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop\|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083	2019-09-27 12:56:24 +00:00

7 Commits