Commit Graph

30 Commits

Author SHA1 Message Date
Roman Lebedev 628f1ae504 [llvm-exegesis] Fix error propagation from yaml writing (from serialization)
Investigating https://bugs.llvm.org/show_bug.cgi?id=41448

llvm-svn: 358076
2019-04-10 12:19:57 +00:00
Roman Lebedev 69716394f3 [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Clement Courbet 362653f7af [llvm-exegesis] Add throughput mode.
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.

Fixes PR37698.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D57000

llvm-svn: 352632
2019-01-30 16:02:20 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Clement Courbet 0d79aaf1a7 Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes."
This reverts accidental commit rL346394.

llvm-svn: 346398
2018-11-08 12:09:45 +00:00
Clement Courbet c0950ae990 [llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes.
llvm-svn: 346394
2018-11-08 11:45:14 +00:00
Fangrui Song 32401afd8c [llvm-exegesis] Move namespace exegesis inside llvm::
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53455

llvm-svn: 344922
2018-10-22 17:10:47 +00:00
Clement Courbet 28d4f85824 [llvm-exegesis] Get rid of debug_string.
Summary:
THis is a backwards-compatible change (existing files will work as
expected).

See PR39082.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52546

llvm-svn: 343108
2018-09-26 13:35:10 +00:00
Clement Courbet 596c56ff9c [llvm-exegesis] Add support for measuring NumMicroOps.
Summary:
Example output for vzeroall:

---
mode:            uops
key:
  instructions:
    - 'VZEROALL'
  config:          ''
  register_initial_values:
cpu_name:        haswell
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006,
      key: '3' }
  - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011,
      key: '4' }
  - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004,
      key: '5' }
  - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018,
      key: '6' }
  - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002,
      key: '7' }
  - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019,
      key: '8' }
  - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033,
      key: '9' }
  - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001,
      key: '10' }
  - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069,
      key: NumMicroOps }
error:           ''
info:            ''
assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3
...

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52539

llvm-svn: 343094
2018-09-26 11:22:56 +00:00
Clement Courbet 684a5f6753 [llvm-exegesis] Output the unscaled value as well as the scaled one.
Summary: See PR38936 for context.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52500

llvm-svn: 343081
2018-09-26 08:37:21 +00:00
Guillaume Chatelet 345fae5d56 [llvm-exegesis] Serializes registers initial values.
Summary: Adds the registers initial values to the YAML output of llvm-exegesis.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52460

llvm-svn: 342982
2018-09-25 15:15:54 +00:00
Guillaume Chatelet 55ad087a4c [llvm-exegesis][NFC] Rewrite of the YAML serialization.
Summary: This is a NFC in preparation of exporting the initial registers as part of the YAML dump

Reviewers: courbet

Reviewed By: courbet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52427

llvm-svn: 342967
2018-09-25 12:18:08 +00:00
Clement Courbet f64007fe82 [llvm-exegesis][NFC] Add more comments.
llvm-svn: 334811
2018-06-15 09:27:12 +00:00
Clement Courbet 4273e1e828 [llvm-exegesis] Print the whole snippet in analysis.
Summary:
On hover, the whole asm snippet is displayed, including operands.

This requires the actual assembly output instead of just the MCInsts:
This is because some pseudo-instructions get lowered to actual target
instructions during codegen (e.g. ABS_Fp32 -> SSE or X87).

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48164

llvm-svn: 334805
2018-06-15 07:30:45 +00:00
Clement Courbet 49fad1cbf2 [llvm-exegesis] Use BenchmarkResult::Instructions instead of OpcodeName
Summary:
Get rid of OpcodeName.

To remove the opcode name from an old file:
```
cat old_file | sed '/opcode_name.*/d'
```

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48121

llvm-svn: 334691
2018-06-14 06:57:52 +00:00
Guillaume Chatelet 8c91d4cb04 [llvm-exegesis] Improve error reporting.
Summary: BenchmarkResult IO functions now return an Error or Expected so caller can deal take proper action.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47868

llvm-svn: 334167
2018-06-07 07:51:16 +00:00
Clement Courbet 62b34fa89a [llvm-exegesis] move Mode from Key to BenchmarResult.
Moves the Mode field out of the Key. The existing yaml benchmark results can be fixed with the following script:

```
readonly FILE=$1
readonly MODE=latency # Change to uops to fix a uops benchmark.
cat $FILE | \
  sed "/^\ \+mode:\ \+$MODE$/d" | \
  sed "/^cpu_name.*$/i mode:            $MODE"
```

Differential Revision: https://reviews.llvm.org/D47813

Authored by: Guillaume Chatelet

llvm-svn: 334079
2018-06-06 09:42:36 +00:00
Clement Courbet 53d35d2dc4 [llvm-exegesis] Add instructions to BenchmarkResult Key.
We want llvm-exegesis to explore instructions (effect of initial register values, effect of operand selection). To enable this a BenchmarkResult muststore all the relevant data in its key. This patch starts adding such data. Here we simply allow to store the generated instructions, following patches will add operands and initial values for registers.

https://reviews.llvm.org/D47764

Authored by: Guilluame Chatelet

llvm-svn: 334008
2018-06-05 10:56:19 +00:00
Clement Courbet 2cb97b95a2 [llvm-exegesis][NFC] Use an enum instead of a string for benchmark mode.
Summary: YAML encoding is backwards-compatible.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47705

llvm-svn: 333886
2018-06-04 11:43:40 +00:00
Clement Courbet 7228721b30 [llvm-exegesis] Analysis: Show inconsistencies between checked-in and measured data.
Summary:
We now highlight any sched classes whose measurements do not match the
LLVM SchedModel. "bad" clusters are marked in red.

Screenshot in phabricator diff.

Reviewers: gchatelet

Subscribers: tschuett, mgrang, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D47639

llvm-svn: 333884
2018-06-04 11:11:55 +00:00
Clement Courbet ae8ae5dc78 [llvm-exegesis] Analysis: Show value extents.
Summary: Screenshot attached in phabricator.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47318

llvm-svn: 333181
2018-05-24 12:41:02 +00:00
Clement Courbet 0e69e2d747 reland r332579: [llvm-exegesis] Update to cover latency through another opcode.
Restructuring the code to measure latency and uops.
The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash).
It now uses BitVector instead of Graph for performance reasons.

https://reviews.llvm.org/D46821

(with fixed ARM tests)

Authored by Guillaume Chatelet

llvm-svn: 332592
2018-05-17 10:52:18 +00:00
Clement Courbet 295a554ce4 Revert r332579 "[llvm-exegesis] Update to cover latency through another opcode."
The revision failed to update the ARM tests.

llvm-svn: 332580
2018-05-17 08:12:29 +00:00
Clement Courbet ee110fb735 [llvm-exegesis] Update to cover latency through another opcode.
Restructuring the code to measure latency and uops.
    The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash).
    It now uses BitVector instead of Graph for performance reasons.

    https://reviews.llvm.org/D46821

    Authored by Guillaume Chatelet

llvm-svn: 332579
2018-05-17 07:38:21 +00:00
Clement Courbet a66bfaa4c0 [llvm-exegesis] Split AsmTemplate.Name into components.
Summary:
AsmTemplate becomes IntructionBenchmarkKey, which has three components.
This allows retreiving the opcode for analysis.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46873

llvm-svn: 332348
2018-05-15 13:07:05 +00:00
Clement Courbet 7b7c27afca [llvm-exegesis] Allow lists of BenchmarkResults to be parsed as std::vector<BenchmarkResult>.
llvm-svn: 332221
2018-05-14 09:01:22 +00:00
Clement Courbet 8fb6e40de8 [llvm-exegesis] Fix compilation on lld-x86_64-darwin13
YAMLTraits does not know how to serialize `size_t` portably. Use `int`
instead.

llvm-svn: 329176
2018-04-04 12:01:46 +00:00
Clement Courbet ac74acdefe Re-land r329156 "Add llvm-exegesis tool."
Fixed to depend on and initialize the native target instead of X86.

llvm-svn: 329169
2018-04-04 11:37:06 +00:00
Clement Courbet 7949b3b1dc Revert r329156 "Add llvm-exegesis tool."
Breaks a bunch of bots.

llvm-svn: 329157
2018-04-04 08:22:54 +00:00
Clement Courbet 7287b2c1ec Add llvm-exegesis tool.
Summary:
[llvm-exegesis][RFC] Automatic Measurement of Instruction Latency/Uops

This is the code corresponding to the RFC "llvm-exegesis Automatic Measurement of Instruction Latency/Uops".

The RFC is available on the LLVM mailing lists as well as the following document
for easier reading:
https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?usp=sharing

Subscribers: mgorny, gchatelet, orwant, llvm-commits

Differential Revision: https://reviews.llvm.org/D44519

llvm-svn: 329156
2018-04-04 08:13:32 +00:00