Commit Graph

79 Commits

Author SHA1 Message Date
Matt Devereau 30b045aba6 [AArch64][SVE] Extend LD1RQ ISel patterns to cover missing addressing modes
Add some missing patterns for ld1rq's scalar + scalar addressing mode.
Also, adds the scalar + imm and scalar + scalar addressing modes for
the patterns added in https://reviews.llvm.org/D130010

Differential Revision: https://reviews.llvm.org/D130993
2022-08-25 13:07:37 +00:00
zhongyunde 3c8f327ce9 [AArch64] Fix sched model for tsv110
Update three changes:
1.Split the Load/Store resources into two, Ld0St and Ld1,
  since only one of them is capable of stores.
2.Integer ADD and SUB instructions have different latencies
  and processor resource usage (pipeline) when they have a shift of
  zero vs. non-zero, refer to D8043
3.The throughout of scalar DIV instruction.

Reviewed By: dmgreen, bryanpkc

Differential Revision: https://reviews.llvm.org/D132529
2022-08-25 19:20:07 +08:00
zhongyunde 319fd6a69c [NFC][AArch64] precommit sched model for tsv110
Part of the schedule model is not accurate, so need a initial test record the changes.
This assemble list is refer to the basic part of D128631

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132103
2022-08-18 17:33:45 +08:00
Yuta Mukai 3f561996bf [AArch64] Fix and add A64FX scheduling resource/latency info
1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR)
   is added and CompleteModel is set to one.

2. Information for pseudo SVE instructions is added. Those
   instructions are present at the time of scheduling.

3. Resource and latency information for SVE instructions is modified
   to be more accurate.
   For example, the description for CMPEQ, which consumes one cycle
   each of unit FLA and PPR, is as follows.
```
Previous:
  def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>;
  def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {...
Modified:
  def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>;
  def A64FXGI1 : ProcResGroup<[A64FXIPPR]>;
  def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {...
```

Reference: A64FX Microarchitecture Manual (Table 16-3)
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf

Reviewed By: dmgreen, kawashima-fj

Differential Revision: https://reviews.llvm.org/D131165
2022-08-09 10:53:40 +09:00
David Green 408378a0b3 [AArch64] Tone down the number of repeated fmov N2 scheduling tests. NFC 2022-08-05 08:11:57 +01:00
Cullen Rhodes 767b26a4e2 [MCA] Support multiple comma-separated -mattr features
Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D129479
2022-07-12 08:20:11 +00:00
Cullen Rhodes d1c51d45f0 [AArch64] Use Neoverse N2 sched model as default for:
- Cortex-A710
  - Cortex-X2
  - Neoverse-V1
  - Neoverse-512tvb

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D129203
2022-07-08 13:34:13 +00:00
Cullen Rhodes 03af9ba680 [AArch64] Initial sched model for Neoverse N2
The optimization guide can be found here:
https://developer.arm.com/documentation/PJDOC-466751330-18256/latest/

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128631
2022-07-08 09:39:13 +00:00
David Green 61b616755a Partially revert "[SchedModels][CortexA55] Add ASIMD integer instructions"
The Cortex-A55 scheduling model is used for -mcpu=generic, meaning it
can have a wider effect than just the A55. The changes to the A55
scheduling model seems to have caused performance regressions on
Cortex-A510 device which have latencies closer to the original and
different forwarding paths.

This partially reverts the changes from D117003, at least until we can
do something to improve Cortex-A510. According to my results, this
improves the A510 results without altering the A55 very much.
2022-02-28 10:58:52 +00:00
Pavel Kosov 37fa99eda0 [SchedModels][CortexA55] Add ASIMD integer instructions
Depends on D114642

Original review https://reviews.llvm.org/D112201

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D117003
2022-02-17 13:41:57 +03:00
Patrick Holland 85e6e748d4 [MCA] Switching from conservatively guessing which instructions are
memory-barrier instructions to providing targets and developers a convenient
way to explicitly declare which instructions are memory-barriers.

Differential Revision: https://reviews.llvm.org/D116779
2022-01-11 13:50:14 -08:00
Pavel Kosov 34a91d7748 [SchedModels][CortexA55] Fix scheduling of FP loads
Patch fixes scheduling of FP load instructions with pre/post increment adding WriteAdr for address operand.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D116361

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
2022-01-10 10:14:45 +03:00
David Green e9adcbde31 [AArch64] Model Cortex-A55 Q register NEON instructions
Cortex-A55 has 2 64bit NEON vector units, meaning a 128bit instruction
requires taking both units (and can only be issued as the first
instruction in a dual issue pair). This patch models that by splitting
the WriteV SchedWrite into two - the WriteVd that reads/writes only
64bit operands, and the WriteVq that read/writes 128bit registers. The
A55 schedule then uses this distinction to model the WriteVq as taking
both resource units, and starting a Schedule Group and WriteVd as taking
one as before.

I believe this is more correct, even if it does not lead to much better
performance.

Differential Revision: https://reviews.llvm.org/D108766
2021-09-29 16:55:31 +01:00
David Green 6ffc6951a3 [AArch64] Remove unpredictable from narrowing instructions.
Like other similar instructions the xtn2 family do not have side
effects, and explicitly marking them as such can help improve scheduling
freedom.
2021-08-26 09:43:44 +01:00
David Green 9474b03d41 [AArch64] Add a Cortex-A55 NEON scheduler test case. 2021-08-26 09:43:44 +01:00
David Green 50f4ae58eb [AArch64] Correct store ReadAdrBase operand
It appears that the Read operand for stores was being placed on the
first operand (the stored value) not the address base. This adds a
ReadST for the stored value operand, allowing the ReadAdrBase to
correctly act upon the address.

Differential Revision: https://reviews.llvm.org/D108287
2021-08-23 21:07:55 +01:00
David Green 955c9437fd [AArch64] Add Scheduling tests for Load/Store ReadAdv operands. 2021-08-23 21:07:55 +01:00
Andrew Savonichev bcc83a2e83 [MCA] Use LSU for the in-order pipeline
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).

Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.

Differential Revision: https://reviews.llvm.org/D103955
2021-07-29 14:40:23 +03:00
Nicholas Guy 9769535efd [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling
Specifying the latencies of specific LDP variants appears to improve
performance almost universally.

Differential Revision: https://reviews.llvm.org/D105882
2021-07-16 12:00:57 +01:00
David Green f73334c46d [AArch64] Set the latency of Cortex-A55 stores to 1
This sets the latency of stores to 1 in the Cortex-A55 scheduling model,
to better match the values given in the software optimization guide.

The latency of a store in normal llvm scheduling does not appear to have
a lot of uses. If the store has no outputs then the latency is somewhat
meaningless (and pre/post increment update operands use the WriteAdr
write for those operands instead). The one place it does alter things is
the latency between a store and the end of the scheduling region, which
can in turn have an effect on the critical path length. As a result a
latency of 1 is more correct and offers ever-so-slightly better
scheduling of instructions near the end of the block.

They are marked as RetireOOO to keep the llvm-mca from introducing
stalls where non would exist.

Differential Revision: https://reviews.llvm.org/D105541
2021-07-12 13:39:35 +01:00
Andrea Di Biagio 5f500d73cd [MCA] Add a test for PR50483. NFC 2021-05-26 15:52:11 +01:00
Andrea Di Biagio 63cc9fd579 [MCA][InOrderIssueStage] Fix LastWriteBackCycle computation.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
2021-05-26 14:17:43 +01:00
Andrew Savonichev f08a2fc09e [MCA] Add tests for IPC on Cortex-A55
The tests compare IPC statistics that MCA provides with IPC values
measured on Cortex-A55 hardware. For hardware tests, each snippet is
run in a loop unrolled by 1000, and IPC is measured by linux-perf.

Several tests do not match the hardware: the skewed ALU is not
supported, LDR seem to be missing a forwarding path.

Differential Revision: https://reviews.llvm.org/D98174
2021-04-08 19:37:07 +03:00
Andrew Savonichev bba25a9cd8 [MCA] Support carry-over instructions for in-order processors
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.

The patch fixes PR49712.

Differential Revision: https://reviews.llvm.org/D99339
2021-03-26 00:06:19 +03:00
Andrew Savonichev 292da93d59 [MCA] Disable RCU for InOrderIssueStage
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order

When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.

Differential Revision: https://reviews.llvm.org/D98628
2021-03-24 13:54:04 +03:00
Andrew Savonichev e6ce0db378 [MCA] Ensure that writes occur in-order
Delay the issue of a new instruction if that leads to out-of-order
commits of writes.

This patch fixes the problem described in:
https://bugs.llvm.org/show_bug.cgi?id=41796#c3

Differential Revision: https://reviews.llvm.org/D98604
2021-03-18 17:10:20 +03:00
Andrew Savonichev d791695cb5 [MCA] Add support for in-order CPUs
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.

In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.

Differential Revision: https://reviews.llvm.org/D94928
2021-03-04 14:08:19 +03:00
David Green 6c89f6fae4 [AArch64] Attempt to fix Mac tests with a more specific triple. NFC 2021-01-04 11:29:18 +00:00
Usman Nadeem 685c8b537a [AARCH64] Improve accumulator forwarding for Cortex-A57 model
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.

The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).

Differential Revision: https://reviews.llvm.org/D92296
2021-01-04 10:58:43 +00:00
Sjoerd Meijer 630d37dc1b [AArch64] Enable Cortex-A55 schedmodel
The model was committed in 4b8ade837e
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.

Differential Revision: https://reviews.llvm.org/D88017
2020-11-30 19:28:34 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Evgeny Leviant 2e61cd1295 [MachineScheduler] Fix operand scheduling for pre/post-increment loads
Differential revision: https://reviews.llvm.org/D87557
2020-09-12 16:53:12 +03:00
Andrea Di Biagio 5578ec32f9 [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.
This fixes a regression introduced by a very old commit 280ac1fd1d (was
llvm-svn 361950).

Commit 280ac1fd1d redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).

The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations.  However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).

Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel.  The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.

This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.

I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).

Some tests for processor Barcelona are improved/fixed by this change and they
now show better results.  In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue.  In one case in particular we now
correctly see a store executed out of order.  That test was affected by the same
underlying issue reported as PR45793.

Reviewers: mattd

Differential Revision: https://reviews.llvm.org/D79351
2020-05-05 10:25:36 +01:00
Evandro Menezes ff0f407e90 [MCA] Fix test cases (NFC)
Fix the test cases for Exynos M5 that break under Darwin.
2019-11-22 16:19:58 -06:00
Evandro Menezes 48b7fe02a1 [AArch64] Add the pipeline model for Exynos M5
Add the scheduling and cost models for Exynos M5.
2019-11-22 15:09:17 -06:00
Eric Christopher 8259182e51 Revert "[AArch64] Add the pipeline model for Exynos M5"
as it's causing test failures in llvm-mca.

This reverts commit 9bdfee2a3b.
2019-11-20 16:04:52 -08:00
Evandro Menezes 9bdfee2a3b [AArch64] Add the pipeline model for Exynos M5
Add the scheduling and cost models for Exynos M5.
2019-11-20 16:56:07 -06:00
Evandro Menezes 80c03fb5c2 [mca] Fix test case (NFC)
Fix test case for Darwin builds.
2019-10-31 16:44:52 -05:00
Evandro Menezes f9af4ccb8a [AArch64] Update for Exynos
Fix the costs of `add` and `orr` with an immediate operand.
2019-10-31 15:25:22 -05:00
Evandro Menezes 215da6606c [clang][llvm] Obsolete Exynos M1 and M2 2019-10-30 15:02:59 -05:00
Andrea Di Biagio f6a60f1f80 [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.

llvm-svn: 357919
2019-04-08 16:05:54 +00:00
Evandro Menezes 946fe976fd [llvm-mca] Update tests for Exynos (NFC)
Update test cases for Exynos M4.

llvm-svn: 350961
2019-01-11 19:36:27 +00:00
Evandro Menezes 9b7b5b1dcc [llvm-mca] Update the Exynos test cases (NFC)
Add more entropy to the test cases.

llvm-svn: 350662
2019-01-08 22:29:56 +00:00
Evandro Menezes 7927a45cdb [llvm-mca] Rename directory for the Cortex tests (NFC)
llvm-svn: 349688
2018-12-19 22:24:42 +00:00
Evandro Menezes 7f37ec7cd3 [llvm-mca] Update Exynos test cases (NFC)
llvm-svn: 349687
2018-12-19 22:24:39 +00:00
Evandro Menezes 5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Evandro Menezes 1cfab9747d [llvm-mca] Split test (NFC)
Split the Exynos test of the register offset addressing mode into separate
loads and stores tests.

llvm-svn: 349651
2018-12-19 17:37:14 +00:00
Evandro Menezes 031abc2bd7 [llvm-mca] Improve test (NFC)
Add more instruction variations for Exynos.

llvm-svn: 349567
2018-12-18 23:19:52 +00:00
Evandro Menezes 4bfd4ce1bc [llvm-mca] Update the Exynos test cases (NFC)
Add more entropy to the test cases.

llvm-svn: 349537
2018-12-18 20:46:03 +00:00