Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order
When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
Differential Revision: https://reviews.llvm.org/D98628
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.
In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.
Differential Revision: https://reviews.llvm.org/D94928
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.
The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).
Differential Revision: https://reviews.llvm.org/D92296
The model was committed in 4b8ade837e
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.
Differential Revision: https://reviews.llvm.org/D88017
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.
Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?
Differential Revision: https://reviews.llvm.org/D90153
This fixes a regression introduced by a very old commit 280ac1fd1d (was
llvm-svn 361950).
Commit 280ac1fd1d redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).
The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations. However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).
Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel. The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.
This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.
I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).
Some tests for processor Barcelona are improved/fixed by this change and they
now show better results. In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue. In one case in particular we now
correctly see a store executed out of order. That test was affected by the same
underlying issue reported as PR45793.
Reviewers: mattd
Differential Revision: https://reviews.llvm.org/D79351
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.
llvm-svn: 357919
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
It was failing as below. Adding a triple seems to help.
--
: 'RUN: at line 2'; /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m1 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M1
: 'RUN: at line 3'; /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M3
--
Exit Code: 1
Command Output (stderr):
--
/work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s:36:12: error: M1-NEXT: expected string not found in input
^
<stdin>:21:2: note: scanning from here
1 0 0.25 b Ltmp0
^
--
llvm-svn: 348577
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
`llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order
to resolve the scheduling for variant instructions. Otherwise, it aborts
the building of the instruction model early.
However, the scheduling model emitter in `TableGen` gives up too soon, unless
all processors use only such predicates.
In order to allow more processors to be used with `llvm-mca`, this patch
emits scheduling transitions if any processor uses these predicates. The
transition emitted for the processors using legacy predicates is the one
specified with `NoSchedPred`, which is based on `MCSchedPredicate`.
Preferably, `llvm-mca` should instead assume a reasonable default when a
variant transition is not based on `MCSchedPredicate` for a given processor.
This issue should be revisited in the future.
Differential revision: https://reviews.llvm.org/D54648
llvm-svn: 347504
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
This patch fixes a regression introduced at revision 338702.
A processor resource mask was incorrectly implicitly truncated to an unsigned
quantity. Later on, the truncated mask was used to initialize an element of a
vector of processor resource descriptors.
On targets with more than 32 processor resources, some elements of the vector
are left uninitialized. As a consequence, this bug might have eventually caused
a crash due to null dereference in the Scheduler.
This patch fixes PR38575, and adds a test for it.
llvm-svn: 339768
This makes easier to identify changes in the instruction info flags. It also
helps spotting potential regressions similar to the one recently introduced at
r336728.
Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.
This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.
In future, each instruction flag should be associated with a different character
in the Instruction Info View.
llvm-svn: 336797
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678
We may not have throughput info because it's not specified in the model
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.
Differential Revision: https://reviews.llvm.org/D47723
llvm-svn: 334055
Summary:
It's super irritating.
[properly configured] git client then complains about that double-newline,
and you have to use `--force` to ignore the warning, since even if you
fix it manually, it will be reintroduced the very next runtime :/
Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell
Reviewed By: gbedwell
Subscribers: javed.absar, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47697
llvm-svn: 333887
Previously update_mca_test_checks worked entirely at "block" level where
a block is some sequence of lines delimited by at least one empty line.
This generally worked well, but could sometimes lead to excessive
repetition of check lines for various prefixes if some block was almost
identical between prefixes, but not quite (for example, due to a
different dispatch width in the otherwise identical summary views).
This new analyis attempts to split blocks further in the case where the
following conditions are met:
a) There is some prefix common to every RUN line (typically 'ALL').
b) The first line of the block is common to the output with every prefix.
c) The block has the same number of lines for the output with every prefix.
Also, regenerated all llvm-mca test files with the following command:
update_mca_test_checks.py "../test/tools/llvm-mca/*/*.s" "../test/tools/llvm-mca/*/*/*.s"
The new analysis showed a "multiple lines not disambiguated by prefixes" warning
for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some
explicit prefixes to each of the RUN lines in that test.
Differential Revision: https://reviews.llvm.org/D47321
llvm-svn: 333204
This fixes PR37293.
We can have scheduling classes with no write latency entries, that still consume
processor resources. We don't want to treat those instructions as zero-latency
instructions; they still have to be issued to the underlying pipelines, so they
still consume resource cycles.
This is likely to be a regression which I have accidentally introduced at
revision 330807. Now, if an instruction has a non-empty set of write processor
resources, we conservatively treat it as a normal (i.e. non zero-latency)
instruction.
llvm-svn: 331193
This script can be used to regenerate tests in the
test/tools/llvm-mca directory (PR36904).
Regenerated a number of tests using the pattern: test/tools/llvm-mca/*/*/*.s
Differential Revision: https://reviews.llvm.org/D45369
llvm-svn: 330246