Commit Graph

2 Commits

Author SHA1 Message Date
Roman Lebedev 78eaff2ef8
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
Clement Courbet 9431b72ce9 [llvm-exegesis] Add loop mode for repeating the snippet.
Summary:
Before this change the Executable function was made by duplicating the
snippet. This change adds a --repetion-mode={loop|duplicate} flag that
allows choosing between this behaviour and wrapping the snippet instructions
in a loop.

The new mode can help measurements when the snippet fits in the DSB by
short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but
since these are not part of the critical path, they execute in parallel
with the measured code and do not impact measurements in practice.

Overview of the change:
 - New SnippetRepetitor abstraction that handles repeating the snippet.
   The assembler delegates repeating the instructions to this class.
 - ExegesisTarget learns how to decrement loop counter and jump.
 - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68125

llvm-svn: 373083
2019-09-27 12:56:24 +00:00