llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Devereau	30b045aba6	[AArch64][SVE] Extend LD1RQ ISel patterns to cover missing addressing modes Add some missing patterns for ld1rq's scalar + scalar addressing mode. Also, adds the scalar + imm and scalar + scalar addressing modes for the patterns added in https://reviews.llvm.org/D130010 Differential Revision: https://reviews.llvm.org/D130993	2022-08-25 13:07:37 +00:00
zhongyunde	3c8f327ce9	[AArch64] Fix sched model for tsv110 Update three changes: 1.Split the Load/Store resources into two, Ld0St and Ld1, since only one of them is capable of stores. 2.Integer ADD and SUB instructions have different latencies and processor resource usage (pipeline) when they have a shift of zero vs. non-zero, refer to D8043 3.The throughout of scalar DIV instruction. Reviewed By: dmgreen, bryanpkc Differential Revision: https://reviews.llvm.org/D132529	2022-08-25 19:20:07 +08:00
zhongyunde	319fd6a69c	[NFC][AArch64] precommit sched model for tsv110 Part of the schedule model is not accurate, so need a initial test record the changes. This assemble list is refer to the basic part of D128631 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132103	2022-08-18 17:33:45 +08:00
Yuta Mukai	3f561996bf	[AArch64] Fix and add A64FX scheduling resource/latency info 1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR) is added and CompleteModel is set to one. 2. Information for pseudo SVE instructions is added. Those instructions are present at the time of scheduling. 3. Resource and latency information for SVE instructions is modified to be more accurate. For example, the description for CMPEQ, which consumes one cycle each of unit FLA and PPR, is as follows. ``` Previous: def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>; def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {... Modified: def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>; def A64FXGI1 : ProcResGroup<[A64FXIPPR]>; def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {... ``` Reference: A64FX Microarchitecture Manual (Table 16-3) https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf Reviewed By: dmgreen, kawashima-fj Differential Revision: https://reviews.llvm.org/D131165	2022-08-09 10:53:40 +09:00
David Green	408378a0b3	[AArch64] Tone down the number of repeated fmov N2 scheduling tests. NFC	2022-08-05 08:11:57 +01:00
Cullen Rhodes	767b26a4e2	[MCA] Support multiple comma-separated -mattr features Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D129479	2022-07-12 08:20:11 +00:00
Cullen Rhodes	d1c51d45f0	[AArch64] Use Neoverse N2 sched model as default for: - Cortex-A710 - Cortex-X2 - Neoverse-V1 - Neoverse-512tvb Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D129203	2022-07-08 13:34:13 +00:00
Cullen Rhodes	03af9ba680	[AArch64] Initial sched model for Neoverse N2 The optimization guide can be found here: https://developer.arm.com/documentation/PJDOC-466751330-18256/latest/ Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128631	2022-07-08 09:39:13 +00:00
David Green	61b616755a	Partially revert "[SchedModels][CortexA55] Add ASIMD integer instructions" The Cortex-A55 scheduling model is used for -mcpu=generic, meaning it can have a wider effect than just the A55. The changes to the A55 scheduling model seems to have caused performance regressions on Cortex-A510 device which have latencies closer to the original and different forwarding paths. This partially reverts the changes from D117003, at least until we can do something to improve Cortex-A510. According to my results, this improves the A510 results without altering the A55 very much.	2022-02-28 10:58:52 +00:00
Pavel Kosov	37fa99eda0	[SchedModels][CortexA55] Add ASIMD integer instructions Depends on D114642 Original review https://reviews.llvm.org/D112201 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D117003	2022-02-17 13:41:57 +03:00
Patrick Holland	85e6e748d4	[MCA] Switching from conservatively guessing which instructions are memory-barrier instructions to providing targets and developers a convenient way to explicitly declare which instructions are memory-barriers. Differential Revision: https://reviews.llvm.org/D116779	2022-01-11 13:50:14 -08:00
Pavel Kosov	34a91d7748	[SchedModels][CortexA55] Fix scheduling of FP loads Patch fixes scheduling of FP load instructions with pre/post increment adding WriteAdr for address operand. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116361 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg	2022-01-10 10:14:45 +03:00
David Green	e9adcbde31	[AArch64] Model Cortex-A55 Q register NEON instructions Cortex-A55 has 2 64bit NEON vector units, meaning a 128bit instruction requires taking both units (and can only be issued as the first instruction in a dual issue pair). This patch models that by splitting the WriteV SchedWrite into two - the WriteVd that reads/writes only 64bit operands, and the WriteVq that read/writes 128bit registers. The A55 schedule then uses this distinction to model the WriteVq as taking both resource units, and starting a Schedule Group and WriteVd as taking one as before. I believe this is more correct, even if it does not lead to much better performance. Differential Revision: https://reviews.llvm.org/D108766	2021-09-29 16:55:31 +01:00
David Green	6ffc6951a3	[AArch64] Remove unpredictable from narrowing instructions. Like other similar instructions the xtn2 family do not have side effects, and explicitly marking them as such can help improve scheduling freedom.	2021-08-26 09:43:44 +01:00
David Green	9474b03d41	[AArch64] Add a Cortex-A55 NEON scheduler test case.	2021-08-26 09:43:44 +01:00
David Green	50f4ae58eb	[AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287	2021-08-23 21:07:55 +01:00
David Green	955c9437fd	[AArch64] Add Scheduling tests for Load/Store ReadAdv operands.	2021-08-23 21:07:55 +01:00
Andrew Savonichev	bcc83a2e83	[MCA] Use LSU for the in-order pipeline Load/Store unit is used to enforce order of loads and stores if they alias (controlled by --noalias=false option). Fixes PR50483 - [MCA] In-order pipeline doesn't track memory load/store dependencies. Differential Revision: https://reviews.llvm.org/D103955	2021-07-29 14:40:23 +03:00
Nicholas Guy	9769535efd	[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882	2021-07-16 12:00:57 +01:00
David Green	f73334c46d	[AArch64] Set the latency of Cortex-A55 stores to 1 This sets the latency of stores to 1 in the Cortex-A55 scheduling model, to better match the values given in the software optimization guide. The latency of a store in normal llvm scheduling does not appear to have a lot of uses. If the store has no outputs then the latency is somewhat meaningless (and pre/post increment update operands use the WriteAdr write for those operands instead). The one place it does alter things is the latency between a store and the end of the scheduling region, which can in turn have an effect on the critical path length. As a result a latency of 1 is more correct and offers ever-so-slightly better scheduling of instructions near the end of the block. They are marked as RetireOOO to keep the llvm-mca from introducing stalls where non would exist. Differential Revision: https://reviews.llvm.org/D105541	2021-07-12 13:39:35 +01:00
Andrea Di Biagio	5f500d73cd	[MCA] Add a test for PR50483. NFC	2021-05-26 15:52:11 +01:00
Andrea Di Biagio	63cc9fd579	[MCA][InOrderIssueStage] Fix LastWriteBackCycle computation. Conservatively use the instruction latency to compute the last write-back cycle. Before this patch, the last write cycle computation was incorrect for store instructions that didn't declare any register writes.	2021-05-26 14:17:43 +01:00
Andrew Savonichev	f08a2fc09e	[MCA] Add tests for IPC on Cortex-A55 The tests compare IPC statistics that MCA provides with IPC values measured on Cortex-A55 hardware. For hardware tests, each snippet is run in a loop unrolled by 1000, and IPC is measured by linux-perf. Several tests do not match the hardware: the skewed ALU is not supported, LDR seem to be missing a forwarding path. Differential Revision: https://reviews.llvm.org/D98174	2021-04-08 19:37:07 +03:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Andrew Savonichev	e6ce0db378	[MCA] Ensure that writes occur in-order Delay the issue of a new instruction if that leads to out-of-order commits of writes. This patch fixes the problem described in: https://bugs.llvm.org/show_bug.cgi?id=41796#c3 Differential Revision: https://reviews.llvm.org/D98604	2021-03-18 17:10:20 +03:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
David Green	6c89f6fae4	[AArch64] Attempt to fix Mac tests with a more specific triple. NFC	2021-01-04 11:29:18 +00:00
Usman Nadeem	685c8b537a	[AARCH64] Improve accumulator forwarding for Cortex-A57 model The old CPU model only had MLA->MLA forwarding. I added some missing MUL->MLA read advances and a missing absolute diff accumulator read advance according to the Cortex A57 Software Optimization Guide. The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and spec2006/milc by 8% (repeated runs on multiple devices), causes no significant regressions (none in SPEC). Differential Revision: https://reviews.llvm.org/D92296	2021-01-04 10:58:43 +00:00
Sjoerd Meijer	630d37dc1b	[AArch64] Enable Cortex-A55 schedmodel The model was committed in `4b8ade837e` but not yet enabled to allow for a few fix ups. This adds a few of these fixes, and also a LLVM MCA test to check most instructions. While I do have plans to look into some more tuning, it's time to enable this as it better than using the A53 schedule. Differential Revision: https://reviews.llvm.org/D88017	2020-11-30 19:28:34 +00:00
Caroline Concatto	71038788ce	Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register." This reverts commit `8b281bfaf3`.	2020-11-02 08:15:50 +00:00
Caroline Concatto	8b281bfaf3	[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register. Only the aliases 'xzr' and 'sp' exist for the physical register x31. The reason for wanting to remove the alias 'x31' is because it allows users to write invalid asm that is not accepted by the GNU assembler. Is there any objection to removing this alias? Or do we want to keep this for compatibility with existing code that uses w31/x31? Differential Revision: https://reviews.llvm.org/D90153	2020-11-02 07:57:05 +00:00
Evgeny Leviant	2e61cd1295	[MachineScheduler] Fix operand scheduling for pre/post-increment loads Differential revision: https://reviews.llvm.org/D87557	2020-09-12 16:53:12 +03:00
Andrea Di Biagio	5578ec32f9	[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793. This fixes a regression introduced by a very old commit `280ac1fd1d` (was llvm-svn 361950). Commit `280ac1fd1d` redesigned the logic in the LSUnit with the goal of speeding up isReady() queries, and stabilising the LSUnit API (while also making the load store unit more customisable). The concept of MemoryGroup (effectively an alias set) was added by that commit to better describe and track dependencies between memory operations. However, that concept was not just used for alias dependencies, but it was also used for describing memory "order" dependencies (enforced by the memory consistency model). Instructions of a same memory group were considered "equivalent" as in: independent operations that can potentially execute in parallel. The problem was that the cost of a dependency (in terms of number of cycles) should have been different for "order" dependency. Instructions in an order dependency simply have to have to wait until their predecessors are "issued" to an underlying pipeline (rather than having to wait until predecessors have beeng fully executed). For simple "order" dependencies, this was effectively introducing an artificial delay on the "issue" of independent loads and stores. This patch fixes the issue and adds a new test named 'independent-load-stores.s' to a bunch of x86 targets. That test contains the reproducible posted by Fabian Ritter on PR45793. I had to rerun the update-mca-tests script on several files. To avoid expected regressions on some Exynos tests, I have added a -noalias=false flag (to match the old strict behavior on latencies). Some tests for processor Barcelona are improved/fixed by this change and they now show better results. In a few tests we were incorrectly counting the time spent by instructions in a scheduler queue. In one case in particular we now correctly see a store executed out of order. That test was affected by the same underlying issue reported as PR45793. Reviewers: mattd Differential Revision: https://reviews.llvm.org/D79351	2020-05-05 10:25:36 +01:00
Evandro Menezes	ff0f407e90	[MCA] Fix test cases (NFC) Fix the test cases for Exynos M5 that break under Darwin.	2019-11-22 16:19:58 -06:00
Evandro Menezes	48b7fe02a1	[AArch64] Add the pipeline model for Exynos M5 Add the scheduling and cost models for Exynos M5.	2019-11-22 15:09:17 -06:00
Eric Christopher	8259182e51	Revert "[AArch64] Add the pipeline model for Exynos M5" as it's causing test failures in llvm-mca. This reverts commit `9bdfee2a3b`.	2019-11-20 16:04:52 -08:00
Evandro Menezes	9bdfee2a3b	[AArch64] Add the pipeline model for Exynos M5 Add the scheduling and cost models for Exynos M5.	2019-11-20 16:56:07 -06:00
Evandro Menezes	80c03fb5c2	[mca] Fix test case (NFC) Fix test case for Darwin builds.	2019-10-31 16:44:52 -05:00
Evandro Menezes	f9af4ccb8a	[AArch64] Update for Exynos Fix the costs of `add` and `orr` with an immediate operand.	2019-10-31 15:25:22 -05:00
Evandro Menezes	215da6606c	[clang][llvm] Obsolete Exynos M1 and M2	2019-10-30 15:02:59 -05:00
Andrea Di Biagio	f6a60f1f80	[llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI It makes more sense to print out the number of micro opcodes that are issued every cycle rather than the number of instructions issued per cycle. This behavior is also consistent with the dispatch-stats: numbers from the two views can now be easily compared. llvm-svn: 357919	2019-04-08 16:05:54 +00:00
Evandro Menezes	946fe976fd	[llvm-mca] Update tests for Exynos (NFC) Update test cases for Exynos M4. llvm-svn: 350961	2019-01-11 19:36:27 +00:00
Evandro Menezes	9b7b5b1dcc	[llvm-mca] Update the Exynos test cases (NFC) Add more entropy to the test cases. llvm-svn: 350662	2019-01-08 22:29:56 +00:00
Evandro Menezes	7927a45cdb	[llvm-mca] Rename directory for the Cortex tests (NFC) llvm-svn: 349688	2018-12-19 22:24:42 +00:00
Evandro Menezes	7f37ec7cd3	[llvm-mca] Update Exynos test cases (NFC) llvm-svn: 349687	2018-12-19 22:24:39 +00:00
Evandro Menezes	5d409b2278	[AArch64] Improve the Exynos M3 pipeline model llvm-svn: 349652	2018-12-19 17:37:51 +00:00
Evandro Menezes	1cfab9747d	[llvm-mca] Split test (NFC) Split the Exynos test of the register offset addressing mode into separate loads and stores tests. llvm-svn: 349651	2018-12-19 17:37:14 +00:00
Evandro Menezes	031abc2bd7	[llvm-mca] Improve test (NFC) Add more instruction variations for Exynos. llvm-svn: 349567	2018-12-18 23:19:52 +00:00
Evandro Menezes	4bfd4ce1bc	[llvm-mca] Update the Exynos test cases (NFC) Add more entropy to the test cases. llvm-svn: 349537	2018-12-18 20:46:03 +00:00

1 2

79 Commits