Commit Graph

56 Commits

Author SHA1 Message Date
Andrew Savonichev bba25a9cd8 [MCA] Support carry-over instructions for in-order processors
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.

The patch fixes PR49712.

Differential Revision: https://reviews.llvm.org/D99339
2021-03-26 00:06:19 +03:00
Andrew Savonichev 292da93d59 [MCA] Disable RCU for InOrderIssueStage
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order

When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.

Differential Revision: https://reviews.llvm.org/D98628
2021-03-24 13:54:04 +03:00
Andrew Savonichev e6ce0db378 [MCA] Ensure that writes occur in-order
Delay the issue of a new instruction if that leads to out-of-order
commits of writes.

This patch fixes the problem described in:
https://bugs.llvm.org/show_bug.cgi?id=41796#c3

Differential Revision: https://reviews.llvm.org/D98604
2021-03-18 17:10:20 +03:00
Andrew Savonichev d791695cb5 [MCA] Add support for in-order CPUs
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.

In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.

Differential Revision: https://reviews.llvm.org/D94928
2021-03-04 14:08:19 +03:00
David Green 6c89f6fae4 [AArch64] Attempt to fix Mac tests with a more specific triple. NFC 2021-01-04 11:29:18 +00:00
Usman Nadeem 685c8b537a [AARCH64] Improve accumulator forwarding for Cortex-A57 model
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.

The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).

Differential Revision: https://reviews.llvm.org/D92296
2021-01-04 10:58:43 +00:00
Sjoerd Meijer 630d37dc1b [AArch64] Enable Cortex-A55 schedmodel
The model was committed in 4b8ade837e
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.

Differential Revision: https://reviews.llvm.org/D88017
2020-11-30 19:28:34 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Evgeny Leviant 2e61cd1295 [MachineScheduler] Fix operand scheduling for pre/post-increment loads
Differential revision: https://reviews.llvm.org/D87557
2020-09-12 16:53:12 +03:00
Andrea Di Biagio 5578ec32f9 [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.
This fixes a regression introduced by a very old commit 280ac1fd1d (was
llvm-svn 361950).

Commit 280ac1fd1d redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).

The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations.  However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).

Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel.  The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.

This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.

I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).

Some tests for processor Barcelona are improved/fixed by this change and they
now show better results.  In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue.  In one case in particular we now
correctly see a store executed out of order.  That test was affected by the same
underlying issue reported as PR45793.

Reviewers: mattd

Differential Revision: https://reviews.llvm.org/D79351
2020-05-05 10:25:36 +01:00
Evandro Menezes ff0f407e90 [MCA] Fix test cases (NFC)
Fix the test cases for Exynos M5 that break under Darwin.
2019-11-22 16:19:58 -06:00
Evandro Menezes 48b7fe02a1 [AArch64] Add the pipeline model for Exynos M5
Add the scheduling and cost models for Exynos M5.
2019-11-22 15:09:17 -06:00
Eric Christopher 8259182e51 Revert "[AArch64] Add the pipeline model for Exynos M5"
as it's causing test failures in llvm-mca.

This reverts commit 9bdfee2a3b.
2019-11-20 16:04:52 -08:00
Evandro Menezes 9bdfee2a3b [AArch64] Add the pipeline model for Exynos M5
Add the scheduling and cost models for Exynos M5.
2019-11-20 16:56:07 -06:00
Evandro Menezes 80c03fb5c2 [mca] Fix test case (NFC)
Fix test case for Darwin builds.
2019-10-31 16:44:52 -05:00
Evandro Menezes f9af4ccb8a [AArch64] Update for Exynos
Fix the costs of `add` and `orr` with an immediate operand.
2019-10-31 15:25:22 -05:00
Evandro Menezes 215da6606c [clang][llvm] Obsolete Exynos M1 and M2 2019-10-30 15:02:59 -05:00
Andrea Di Biagio f6a60f1f80 [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.

llvm-svn: 357919
2019-04-08 16:05:54 +00:00
Evandro Menezes 946fe976fd [llvm-mca] Update tests for Exynos (NFC)
Update test cases for Exynos M4.

llvm-svn: 350961
2019-01-11 19:36:27 +00:00
Evandro Menezes 9b7b5b1dcc [llvm-mca] Update the Exynos test cases (NFC)
Add more entropy to the test cases.

llvm-svn: 350662
2019-01-08 22:29:56 +00:00
Evandro Menezes 7927a45cdb [llvm-mca] Rename directory for the Cortex tests (NFC)
llvm-svn: 349688
2018-12-19 22:24:42 +00:00
Evandro Menezes 7f37ec7cd3 [llvm-mca] Update Exynos test cases (NFC)
llvm-svn: 349687
2018-12-19 22:24:39 +00:00
Evandro Menezes 5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Evandro Menezes 1cfab9747d [llvm-mca] Split test (NFC)
Split the Exynos test of the register offset addressing mode into separate
loads and stores tests.

llvm-svn: 349651
2018-12-19 17:37:14 +00:00
Evandro Menezes 031abc2bd7 [llvm-mca] Improve test (NFC)
Add more instruction variations for Exynos.

llvm-svn: 349567
2018-12-18 23:19:52 +00:00
Evandro Menezes 4bfd4ce1bc [llvm-mca] Update the Exynos test cases (NFC)
Add more entropy to the test cases.

llvm-svn: 349537
2018-12-18 20:46:03 +00:00
Evandro Menezes 53f0d41dc4 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Evandro Menezes 7ea7de55ea [llvm-mca] Add new tests for Exynos (NFC)
llvm-svn: 348766
2018-12-10 16:22:29 +00:00
Hans Wennborg c56cc3a889 Fix test/tools/llvm-mca/AArch64/Exynos/direct-branch.s on Mac
It was failing as below. Adding a triple seems to help.

--
: 'RUN: at line 2';   /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m1 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M1
: 'RUN: at line 3';   /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M3
--
Exit Code: 1

Command Output (stderr):
--
/work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s:36:12: error: M1-NEXT: expected string not found in input
           ^
<stdin>:21:2: note: scanning from here
 1 0 0.25 b Ltmp0
 ^

--

llvm-svn: 348577
2018-12-07 09:58:33 +00:00
Evandro Menezes 51df880e70 [llvm-mca] Improve test (NFC)
Add more instructions to the test for Cortex.

llvm-svn: 348565
2018-12-07 03:23:36 +00:00
Evandro Menezes 83beb91450 [llvm-mca] Improve test (NFC)
Add a label to make explicit that the branch is short for Exynos.

llvm-svn: 348564
2018-12-07 03:23:14 +00:00
Evandro Menezes 5d42bc7ce8 [llvm-mca] Simplify test (NFC)
llvm-svn: 348395
2018-12-05 18:34:51 +00:00
Evandro Menezes 86953e4350 [llvm-mca] Sort test run lines (NFC)
llvm-svn: 348393
2018-12-05 18:30:06 +00:00
Evandro Menezes 56368c6fa5 [AArch64] Refactor the scheduling predicates (2/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasShiftedReg()`.

Differential revision: https://reviews.llvm.org/D54820

llvm-svn: 347598
2018-11-26 21:47:41 +00:00
Evandro Menezes b02ac8bd21 [AArch64] Refactor the scheduling predicates (1/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::isScaledAddr()`

Differential revision: https://reviews.llvm.org/D54777

llvm-svn: 347597
2018-11-26 21:47:28 +00:00
Evandro Menezes 079bf4b7b4 [TableGen] Emit more variant transitions
`llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order
to resolve the scheduling for variant instructions.  Otherwise, it aborts
the building of the instruction model early.

However, the scheduling model emitter in `TableGen` gives up too soon, unless
all processors use only such predicates.

In order to allow more processors to be used with `llvm-mca`, this patch
emits scheduling transitions if any processor uses these predicates.  The
transition emitted for the processors using legacy predicates is the one
specified with `NoSchedPred`, which is based on `MCSchedPredicate`.

Preferably, `llvm-mca` should instead assume a reasonable default when a
variant transition is not based on `MCSchedPredicate` for a given processor.
This issue should be revisited in the future.

Differential revision: https://reviews.llvm.org/D54648

llvm-svn: 347504
2018-11-23 21:17:33 +00:00
Evandro Menezes d0792170a3 [llvm-mca] Add test case (NFC)
Add test case that will serve as the base for D54820.

llvm-svn: 347440
2018-11-22 00:38:36 +00:00
Evandro Menezes b9f9042648 [llvm-mca] Add test case (NFC)
Fix previous commit r347434.

llvm-svn: 347437
2018-11-21 23:36:40 +00:00
Evandro Menezes 34b32a3019 [llvm-mca] Add test case (NFC)
Add test case that will serve as the base for D54777.

llvm-svn: 347434
2018-11-21 22:57:46 +00:00
Andrea Di Biagio a2eee47450 [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.

Example:
```
Iterations:        100
Instructions:      300
Total Cycles:      414
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    1.69
IPC:               0.72
Block RThroughput: 4.0
```

This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.

llvm-svn: 340946
2018-08-29 17:56:39 +00:00
Andrea Di Biagio a03f2a77f8 [llvm-mca] Fix PR38575: Avoid an invalid implicit truncation of a processor resource mask (an uint64_t value) to unsigned.
This patch fixes a regression introduced at revision 338702.

A processor resource mask was incorrectly implicitly truncated to an unsigned
quantity. Later on, the truncated mask was used to initialize an element of a
vector of processor resource descriptors.
On targets with more than 32 processor resources, some elements of the vector
are left uninitialized. As a consequence, this bug might have eventually caused
a crash due to null dereference in the Scheduler.

This patch fixes PR38575, and adds a test for it.

llvm-svn: 339768
2018-08-15 12:53:38 +00:00
Andrea Di Biagio d2e2c053cf [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC
This makes easier to identify changes in the instruction info flags.  It also
helps spotting potential regressions similar to the one recently introduced at
r336728.

Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.

This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.

In future, each instruction flag should be associated with a different character
in the Instruction Info View.

llvm-svn: 336797
2018-07-11 12:44:44 +00:00
Sanjay Patel 59313be8d3 [CodeGen] assume max/default throughput for unspecified instructions
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678

We may not have throughput info because it's not specified in the model 
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.

Differential Revision: https://reviews.llvm.org/D47723

llvm-svn: 334055
2018-06-05 23:34:45 +00:00
Roman Lebedev 7b53d1454f [llvm-mca] Make sure not to end the test files with an empty line.
Summary:
It's super irritating.

[properly configured] git client then complains about that double-newline,
and you have to use `--force` to ignore the warning, since even if you
fix it manually, it will be reintroduced the very next runtime :/

Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell

Reviewed By: gbedwell

Subscribers: javed.absar, tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47697

llvm-svn: 333887
2018-06-04 11:48:46 +00:00
Greg Bedwell e790f6fb06 [UpdateTestChecks] Improved update_mca_test_checks block analysis
Previously update_mca_test_checks worked entirely at "block" level where
a block is some sequence of lines delimited by at least one empty line.
This generally worked well, but could sometimes lead to excessive
repetition of check lines for various prefixes if some block was almost
identical between prefixes, but not quite (for example, due to a
different dispatch width in the otherwise identical summary views).

This new analyis attempts to split blocks further in the case where the
following conditions are met:
  a) There is some prefix common to every RUN line (typically 'ALL').
  b) The first line of the block is common to the output with every prefix.
  c) The block has the same number of lines for the output with every prefix.

Also, regenerated all llvm-mca test files with the following command:
update_mca_test_checks.py "../test/tools/llvm-mca/*/*.s" "../test/tools/llvm-mca/*/*/*.s"

The new analysis showed a "multiple lines not disambiguated by prefixes" warning
for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some
explicit prefixes to each of the RUN lines in that test.

Differential Revision: https://reviews.llvm.org/D47321

llvm-svn: 333204
2018-05-24 16:36:44 +00:00
Andrea Di Biagio cb1ed400a4 [llvm-mca] Removed an empty line generated by the timeline view. NFC.
Also, regenerate all tests.

llvm-svn: 332853
2018-05-21 17:11:56 +00:00
Andrea Di Biagio 45ccdd1785 [llvm-mca] Regenerate tests after r332381 and r332361. NFC
llvm-svn: 332447
2018-05-16 10:12:06 +00:00
Andrea Di Biagio e047d3529b [llvm-mca] Correctly handle zero-latency stores that consume pipeline resources.
This fixes PR37293.

We can have scheduling classes with no write latency entries, that still consume
processor resources. We don't want to treat those instructions as zero-latency
instructions; they still have to be issued to the underlying pipelines, so they
still consume resource cycles.

This is likely to be a regression which I have accidentally introduced at
revision 330807. Now, if an instruction has a non-empty set of write processor
resources, we conservatively treat it as a normal (i.e. non zero-latency)
instruction.

llvm-svn: 331193
2018-04-30 15:55:04 +00:00
Greg Bedwell 90d141a295 [UpdateTestChecks] Add update_mca_test_checks.py script
This script can be used to regenerate tests in the
test/tools/llvm-mca directory (PR36904).

Regenerated a number of tests using the pattern: test/tools/llvm-mca/*/*/*.s

Differential Revision: https://reviews.llvm.org/D45369

llvm-svn: 330246
2018-04-18 10:27:45 +00:00