llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	0f7a595095	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	4af4afe014	[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	17f99a8a41	[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	38ceb46fb0	[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	3221e06e9b	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	0b7e52e725	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	055fa84cd8	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	d8a595b81c	[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	fd4cbc822b	[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	f38dcbecb6	[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps. As confirmed by the exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	c79c7bb980	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a57006d627	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a657808948	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	43a7f130a7	[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	336b9dbe88	[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	9c596bc541	[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom Same as with it's float friend, unlike their AVX versions. As confirmed by exegesis, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	3567c7eda1	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests	2021-05-14 11:56:07 +03:00
Roman Lebedev	57eee56d0a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	fdc65e46b6	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	59554c01ab	[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	2a7c52ff7f	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	26c1bffe67	[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle, this one is certainly a zero-cycle instruction, in addition to both of them being dependency breaking. As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	a9fb321a67	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	aa0dcb3ba4	[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom While both the SOG and Agner insist that it is zero-cycle, i can not confirm that claim. While it clearly breaks the dependency, i can not come up with a snippet, or measurement approach, to end up with IPC bigger than 4, which, to me, means that it actually consumes execution resource of an FP unit for a cycle.	2021-05-14 00:03:36 +03:00
Roman Lebedev	6c4596793d	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test	2021-05-14 00:03:36 +03:00
Roman Lebedev	6a64c462eb	[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Still not zero-cycle :)	2021-05-10 23:49:27 +03:00
Roman Lebedev	5864e7b86b	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX YMM VPCMP	2021-05-10 23:49:27 +03:00
Roman Lebedev	2953245337	[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Again, it's not zero-cycle.	2021-05-10 23:49:26 +03:00
Roman Lebedev	f59db6c4f8	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX XMM VPCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	0f3bcb97ef	[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Much like with MMX PCMP, it does actually have to execute, though.	2021-05-10 23:49:26 +03:00
Roman Lebedev	0e538f937a	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg XMM SSE PCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	b24edfff4f	[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom They are, however, not zero-cycle, and do actually execute. As measured by exegesis, and confirmed by ref docs.	2021-05-10 23:49:26 +03:00
Roman Lebedev	ba225ce961	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg MMX PCMPEQ	2021-05-10 23:49:25 +03:00
Roman Lebedev	08cf2776ac	[X86] AMD Zen 3: sub-32-bit CMP also break dependencies They measure as having the same effect as 32-bit CMP.	2021-05-10 20:57:38 +03:00
Roman Lebedev	ecff974b66	[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking	2021-05-10 20:57:37 +03:00
Roman Lebedev	be23d5e814	[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction As measured by exegesis, and confirmed by ref docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	9a31efa2f5	[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	11b0568dce	[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction As confirmed by exegesis measurements, and ref docs. It does actually execute. While there, bump latency for MULX32rr, that seems to match measurements.	2021-05-10 00:03:20 +03:00
Roman Lebedev	8d0e2d2b0f	[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	eed8552787	[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms As measured by exegesis and confirmed in reference docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	ab794852ed	[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests	2021-05-10 00:03:20 +03:00
Roman Lebedev	a21df76db6	[X86] AMD Zen 3: XCHG is a zero-cycle instruction As measured by exegesis and confirmed by reference docs.	2021-05-09 20:37:57 +03:00
Roman Lebedev	2819009b5a	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	a8e30e63ac	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Roman Lebedev	34de155f7e	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	758c173309	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	715c0d0bd4	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	ee020b930d	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	9db4203883	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	0d961fbd52	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	bcbfc22ff9	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	cbabe4f4d6	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	d8c6202576	[X86] AMD Zen 3: throughput for renameable GPR moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:43 +03:00
Roman Lebedev	e6d688ec96	[NFC][X86][MCA] Increase iteration count in reg move elimination tests So the IPC actually stabilizes at 6.	2021-05-07 17:06:43 +03:00
Roman Lebedev	bda9ca3e44	[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves In Zen3, MMX moves are not eliminated, i've verified this with llvm-exegesis.	2021-05-07 13:56:07 +03:00
Roman Lebedev	7059b28d5d	[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 13:56:07 +03:00
Roman Lebedev	227678089c	[NFC][X86][MCA] AMD Zen 3: add tests with eliminatible GPR moves	2021-05-07 13:56:07 +03:00
Roman Lebedev	2b93c9c16c	[X86] AMD Zen 3 Scheduler Model Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`. This is fully built from scratch, from llvm-mca measurements and documented reference materials. Nothing was copied from `znver2`/`znver1`. I believe this is in a reasonable state of completion for inclusion, probably better than D52779 `bdver2` was :) Namely: * uops are pretty spot-on (at least what llvm-mca can measure) {F16422596} * latency is also pretty spot-on (at least what llvm-mca can measure) {F16422601} * throughput is within reason {F16422607} I haven't run much benchmarks with this, however RawSpeed benchmarks says this is beneficial: {F16603978} {F16604029} I'll call out the obvious problems there: * i didn't really bother with X87 instructions * i didn't really bother with obviously-microcoded/system instructions * There are large discrepancy in throughput for `mr` and `rm` instructions. I'm not really sure if it's a modelling defect that needs to be fixed, or it's a defect of measurments. * Pipe distributions are probably bad :) I can't do much here until AMD allows that to be fixed by documenting the appropriate counters and updating libpfm That being said, as @RKSimon notes: >>! In D94395#2647381, @RKSimon wrote: > I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...> so how much worse this could possibly be?! Things that aren't there: * Various tunings: zero idioms, etc. That is follow-ups. Differential Revision: https://reviews.llvm.org/D94395	2021-05-01 22:08:13 +03:00
Andrea Di Biagio	8bd4f3d547	[MCA] Fix CarryOver check in the DispatchStage (PR50174). Early exit from method DispatchStage::isAvailable() if the dispatch group is already full. Not all instructions declare at least one uOP. Fixes PR50174.	2021-04-30 14:26:46 +01:00
Sebastian Neubauer	4897effb14	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Andrew Savonichev	f08a2fc09e	[MCA] Add tests for IPC on Cortex-A55 The tests compare IPC statistics that MCA provides with IPC values measured on Cortex-A55 hardware. For hardware tests, each snippet is run in a loop unrolled by 1000, and IPC is measured by linux-perf. Several tests do not match the hardware: the skewed ALU is not supported, LDR seem to be missing a forwarding path. Differential Revision: https://reviews.llvm.org/D98174	2021-04-08 19:37:07 +03:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Andrea Di Biagio	f5bdc88e4d	[MCA] Improved handling of negative read-advance cycles. Before this patch, register writes were always invalidated by the RegisterFile at instruction commit stage. So, the RegisterFile was often losing the knowledge about the `execute cycle` of writes already committed. While this was not problematic for non-delayed reads, this was sometimes leading to inaccurate read latency computations in the presence of negative read-advance cycles. This patch fixes the issue by changing how the RegisterFile component internally keeps track of the `execute cycle` information of each write. On every instruction executed, the RegisterFile gets notified by the RetireStage, so that it can internally record the execute cycle of each executed write. The `execute cycle` information is stored within WriteRef itself, and it is not invalidated when the write is committed.	2021-03-23 14:47:23 +00:00
Andrew Savonichev	e6ce0db378	[MCA] Ensure that writes occur in-order Delay the issue of a new instruction if that leads to out-of-order commits of writes. This patch fixes the problem described in: https://bugs.llvm.org/show_bug.cgi?id=41796#c3 Differential Revision: https://reviews.llvm.org/D98604	2021-03-18 17:10:20 +03:00
Jay Foad	7340fd6886	[MCA] Support in-order CPUs with MicroOpBufferSize=1 Differential Revision: https://reviews.llvm.org/D98356	2021-03-11 10:12:54 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
Abhina Sreeskantharajan	42a21778f6	[test] Use host platform specific error message substitution in lit tests On z/OS, the following error message is not matched correctly in lit tests. ``` EDC5129I No such file or directory. ``` This patch uses a lit config substitution to check for platform specific error messages. Reviewed By: muiez, jhenderson Differential Revision: https://reviews.llvm.org/D95246	2021-01-29 07:16:30 -05:00
Abhina Sreeskantharajan	978444d531	Revert "[SystemZ][z/OS] Fix No such file or directory expression error" This reverts commit `06f8a49693`.	2021-01-25 08:29:38 -05:00
Wolfgang Pieb	7143b63017	[llvm-mca] Adding local lit config file for X86 targets	2021-01-22 09:52:57 -08:00
Wolfgang Pieb	020c00b5d3	[llvm-mca] Test case was missing a triple.	2021-01-21 16:19:32 -08:00
Wolfgang Pieb	d38be2ba0e	[llvm-mca] Initial implementation of serialization using JSON. The views implemented at this time are Summary, Timeline, ResourcePressure and InstructionInfo. Use --json on the command line to obtain JSON output.	2021-01-21 15:15:54 -08:00
Abhina Sreeskantharajan	689aaba7ac	[SystemZ][z/OS] Fix No such file or directory expression error matching in lit tests On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully. ``` EDC5129I No such file or directory. ``` Reviewed By: muiez Differential Revision: https://reviews.llvm.org/D94239	2021-01-18 07:14:37 -05:00
David Green	6c89f6fae4	[AArch64] Attempt to fix Mac tests with a more specific triple. NFC	2021-01-04 11:29:18 +00:00
Usman Nadeem	685c8b537a	[AARCH64] Improve accumulator forwarding for Cortex-A57 model The old CPU model only had MLA->MLA forwarding. I added some missing MUL->MLA read advances and a missing absolute diff accumulator read advance according to the Cortex A57 Software Optimization Guide. The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and spec2006/milc by 8% (repeated runs on multiple devices), causes no significant regressions (none in SPEC). Differential Revision: https://reviews.llvm.org/D92296	2021-01-04 10:58:43 +00:00
Craig Topper	0cbceed27c	[TableGen][ARM][X86] Detect combining IntrReadMem and IntrWriteMem. These properties aren't additive. They are closer to ReadOnly and WriteOnly. The default is ReadWrite. ReadMem cancels the write property and WriteMem cancels the read property. Combining them leaves neither. This patch checks that when we process WriteMem, the Mod flag is still set. And for ReadMem we check that the Ref flag set still set. I've updated 2 target intrinsics that were combining these properties. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93571	2020-12-19 14:56:17 -08:00
Craig Topper	f47b07315a	[X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands These instructions read their inputs from fixed registers rather than using a modrm byte. We shouldn't require the user to list them when parsing assembly. This matches the GNU assembler. This patch adds InstAliases so we can accept either form. It also changes the printing code to use the form without registers. This will change the behavior of llvm-objdump, but should be consistent with binutils objdump. This also matches what we already do in LLVM for clzero and monitorx which also used fixed registers. I need to add and improve tests before this can be commited. The disassembler tests exist, but weren't checking the fixed register so they pass before and after this change. Fixes https://github.com/ClangBuiltLinux/linux/issues/1216 Differential Revision: https://reviews.llvm.org/D93524	2020-12-19 11:01:55 -08:00
Sjoerd Meijer	630d37dc1b	[AArch64] Enable Cortex-A55 schedmodel The model was committed in `4b8ade837e` but not yet enabled to allow for a few fix ups. This adds a few of these fixes, and also a LLVM MCA test to check most instructions. While I do have plans to look into some more tuning, it's time to enable this as it better than using the A53 schedule. Differential Revision: https://reviews.llvm.org/D88017	2020-11-30 19:28:34 +00:00
Evgeny Leviant	9c3b68dc6f	[llvm-mca] Fix processing thumb instruction set Differential revision: https://reviews.llvm.org/D91704	2020-11-24 18:27:59 +03:00
Evgeny Leviant	50bd686695	Add support for branch forms of ALU instructions to Cortex-A57 model Patch fixes scheduling of ALU instructions which modify pc register. Patch also fixes computation of mutually exclusive predicates for sequences of variants to be properly expanded Differential revision: https://reviews.llvm.org/D91266	2020-11-24 11:43:51 +03:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
Evgeny Leviant	885d3f4129	[llvm-mca] Add branch forms of ALU instructions to Cortex-A57 test	2020-11-09 16:53:50 +03:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
Caroline Concatto	71038788ce	Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register." This reverts commit `8b281bfaf3`.	2020-11-02 08:15:50 +00:00
Caroline Concatto	8b281bfaf3	[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register. Only the aliases 'xzr' and 'sp' exist for the physical register x31. The reason for wanting to remove the alias 'x31' is because it allows users to write invalid asm that is not accepted by the GNU assembler. Is there any objection to removing this alias? Or do we want to keep this for compatibility with existing code that uses w31/x31? Differential Revision: https://reviews.llvm.org/D90153	2020-11-02 07:57:05 +00:00
Andrea Di Biagio	0e20666db3	[MCA][LSUnit] Correctly update the internal group flags on store barrier execution. Fixes PR48024. This is likely to be a regressigion introduced by my last refactoring of the LSUnit (commit `5578ec32f9`). Before this patch, the "CurrentStoreBarrierGroupID" index was not correctly reset on store barrier executions. This was leading to unexpected crashes like the one reported as PR48024.	2020-10-31 11:57:27 +00:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Evgeny Leviant	1876d06ea3	[llvm-mca] Add few memory instructions to cortex-a57 test	2020-10-26 14:18:15 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
Evgeny Leviant	b651ecfb72	[llvm-mca] Extend cortex-a57 memory instructions test Patch adds few/load store instructions which have custom sched classes in cortex-a57 model.	2020-10-23 17:02:20 +03:00
Evgeny Leviant	ffc0f577da	[llvm-mca] Add test for cortex-a57 NEON instructions	2020-10-23 10:55:54 +03:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	088f3c83cc	[llvm-mca] Add few ldm* instructions to cortex-a57 test case	2020-10-22 16:21:40 +03:00
Evgeny Leviant	efcb3952e0	[llvm-mca] Improve test case	2020-10-22 12:08:08 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Evgeny Leviant	9f5ece63ce	[llvm-mca] Add test for cortex-a57 memory instructions	2020-10-21 15:09:26 +03:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
Evgeny Leviant	6e56046f65	[TableGen][SchedModels] Fix aliasing of SchedWriteVariant Differential revision: https://reviews.llvm.org/D89114	2020-10-13 13:05:24 +03:00
Evgeny Leviant	7102793065	Add test for cortex-a57/ARM sched model. NFC	2020-10-12 12:49:56 +03:00
Craig Topper	7f3da48885	[X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel.	2020-10-03 18:44:53 -07:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Evgeny Leviant	2e61cd1295	[MachineScheduler] Fix operand scheduling for pre/post-increment loads Differential revision: https://reviews.llvm.org/D87557	2020-09-12 16:53:12 +03:00
Craig Topper	f7c87b7e37	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Craig Topper	31c40f2d6b	[X86] Add mayLoad/mayStore flags to some X87 instructions that don't have isel patterns to infer them from. Should remove part of the differences in D81833 due to some some of these getting isel patterns.	2020-06-23 23:40:30 -07:00
David Green	d604cc6e9a	[ARM] Mark more integer instructions as not having side effects. LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG transforms, so do not have tblgen patterns. They don't get marked as having side effects so cannot be scheduled as efficiently as you would like. This specifically marks then as not having side effects. Differential Revision: https://reviews.llvm.org/D82358	2020-06-23 22:45:51 +01:00
David Green	887c0b5665	[ARM] Cortex-M4 integer instructions scheduler info test. NFC Most useful at the moment for showing where unpredicatable instructions are.	2020-06-23 22:26:23 +01:00
Wang, Pengfei	6565b58584	[X86][llvm-mc] Make the suffix matcher more accurate. Summary: Some instruction like VPMULDQ is NOT the variant of VPMULD but a new one. So we should make sure the suffix matcher only works for memory variant that has the same size with the suffix. Currently we only check for SSE/AVX* instructions, because many legacy instructions didn't declare the alias instructions of their variants. Differential Revision: https://reviews.llvm.org/D80608	2020-05-27 14:45:17 +08:00
Andrea Di Biagio	47b95d7cf4	[MCA][InstrBuilder] Correctly mark reserved resources in initializeUsedResources. This fixes a bug reported by Alex Renda on LLVMDev where mca did not correctly mark a resource group as "reserved". (See http://lists.llvm.org/pipermail/llvm-dev/2020-May/141485.html). The issue was caused by a wrong check in function `initializeUsedResources`. As a consequence of this, a resource group was left unreserved, and its field `NumUnits` incorrectly reported an unrealistic number of consumed resource units. This patch fixes the issue with the handling of reserved resources in the InstrBuilder class, and adds a simple test for it. Ideally, as suggested by Andy Trick, most of these problems will disappear if in the future we will introduce a (optional) DelayCycles vector for SchedWriteRes.	2020-05-10 19:25:54 +01:00
Craig Topper	465f5648ee	[X86] Remove the mayLoad and mayStore flags from vzeroupper/vzeroall. But leave the hasUnmodelledSideEffects flag.	2020-05-08 12:47:20 -07:00
Andrea Di Biagio	5bb5fa3c0a	Forgot to add a -mtriple to a test. NFC This should unbreak the clang-ppc64be-linux buildbot.	2020-05-05 10:48:00 +01:00
Andrea Di Biagio	5578ec32f9	[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793. This fixes a regression introduced by a very old commit `280ac1fd1d` (was llvm-svn 361950). Commit `280ac1fd1d` redesigned the logic in the LSUnit with the goal of speeding up isReady() queries, and stabilising the LSUnit API (while also making the load store unit more customisable). The concept of MemoryGroup (effectively an alias set) was added by that commit to better describe and track dependencies between memory operations. However, that concept was not just used for alias dependencies, but it was also used for describing memory "order" dependencies (enforced by the memory consistency model). Instructions of a same memory group were considered "equivalent" as in: independent operations that can potentially execute in parallel. The problem was that the cost of a dependency (in terms of number of cycles) should have been different for "order" dependency. Instructions in an order dependency simply have to have to wait until their predecessors are "issued" to an underlying pipeline (rather than having to wait until predecessors have beeng fully executed). For simple "order" dependencies, this was effectively introducing an artificial delay on the "issue" of independent loads and stores. This patch fixes the issue and adds a new test named 'independent-load-stores.s' to a bunch of x86 targets. That test contains the reproducible posted by Fabian Ritter on PR45793. I had to rerun the update-mca-tests script on several files. To avoid expected regressions on some Exynos tests, I have added a -noalias=false flag (to match the old strict behavior on latencies). Some tests for processor Barcelona are improved/fixed by this change and they now show better results. In a few tests we were incorrectly counting the time spent by instructions in a scheduler queue. In one case in particular we now correctly see a store executed out of order. That test was affected by the same underlying issue reported as PR45793. Reviewers: mattd Differential Revision: https://reviews.llvm.org/D79351	2020-05-05 10:25:36 +01:00
Georgii Rymar	b6d77e792c	[tools][tests] - Use --check-prefixes instead of multiple --check-prefix. NFCI. There is no need to use `--check-prefix` multiple times. It helps to improve readability/test maintainability. This patch does it for all tools at once. Differential revision: https://reviews.llvm.org/D78217	2020-04-17 12:35:25 +03:00
Craig Topper	02f03a6fd4	[X86] Match vpmullq latency to uops.info. Correct port usage for 512-bit memory form uops.info says these should be 15 cycle instructions. Uops.info also shows the 512-bit form uses port 0 and 5 for both register and memory. We had memory using 0 and 1. Differential Revision: https://reviews.llvm.org/D75549	2020-03-03 12:16:03 -08:00
Craig Topper	20c5968e09	[X86] Increase latency of port5 masked compares and kshift/kadd/kunpck instructions in SKX scheduler model Uops.info shows these as 4 cycle latency.	2020-02-16 16:59:37 -08:00
Craig Topper	c636f694c0	[X86] Add more avx512 instrutions to llvm-mca resource tests	2020-02-16 16:59:36 -08:00
Craig Topper	d7de7ac370	[X86] Raise the latency for VectorImul from 4 to 5 in Skylake scheduler models Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel. This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis. It also matches Agner Fog. Differential Revision: https://reviews.llvm.org/D74357	2020-02-11 11:24:25 -08:00
Craig Topper	c6bdd8e731	[X86] Improve the gather scheduler models for SkylakeClient and SkylakeServer The load ports need a cycle for each potentially loaded element just like Haswell and Skylake. Unlike Haswell and Broadwell, the number of uops does not scale with the number of elements. Instead the load uops run for multiple cycles. I've taken the latency number from the uops.info. The port binding for the non-load uops is taken from the original IACA data I have. Differential Revision: https://reviews.llvm.org/D74000	2020-02-05 13:26:47 -08:00
Simon Pilgrim	8616bd417f	[X86] Fix missing load latencies (PR36894) We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes	2020-02-05 11:53:16 +00:00
Simon Pilgrim	f25a2a3de5	[X86] Fix missing load latencies (PR36894) We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes	2020-02-04 18:18:29 +00:00
Craig Topper	c7768ce522	[X86] Update the haswell and broadwell scheduler information for gather instructions Broadwell was missing half the gather instructions. Both models had some mixups in the resource costs and number of uops. I've updated here based on what I think the original IACA source says with some cross checking against the microcode. I'm not sure about latency as the IACA source I have doesn't have that information. So I'm using the latency from uops.info. I plan to update Skylake models as well, but I'll do that in a separate patch. Differential Revision: https://reviews.llvm.org/D73844	2020-02-03 17:57:48 -08:00
Clement Courbet	c5344d857f	[X86][Sched] A bunch of fixes to the Zen2 sched model latencies. Summary: As determined with `llvm-exegesis`. Some of these look like typos/misunderstandings of the sched model td spec: - latency defaults to `1` when not set => Maybe we can avoid having a default ? - problems with regexps not being anchored by default (XCHG matching CMPXHG) Note that this is not complete, it fixes only the most obvious mistakes, and only for latency (not uops). Reviewers: RKSimon, GGanesh Subscribers: hiraditya, jfb, mstojanovic, hfinkel, craig.topper, andreadb, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73172	2020-01-30 10:20:31 +01:00
Roman Lebedev	76fcf900d5	[X86][BdVer2] Polish LEA instruction scheduling info Based on exhaustive llvm-exegesis measurements. There may still be some imperfections for LEA16r/LEA32r. Much like was observed in D68646, i'm also measuring some outliers with some specific registers.	2020-01-26 22:17:27 +03:00
Roman Lebedev	31019dfdf5	[NFC][MCA] Re-autogenerate all check lines in all X86 MCA tests Some whitespace issues have crept in, and some znver2 check lines were missing..	2020-01-26 22:17:26 +03:00
Clement Courbet	2accdb6ae1	[llvm-mca][NFC] Regenerate tests @HEAD. For Zen2.	2020-01-22 14:50:52 +01:00
Diogo Sampaio	d94d079a6a	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb Reviewed By: efriedma Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680	2020-01-14 11:47:19 +00:00
Ganesh Gopalasubramanian	3408940f73	[X86] AMD Znver2 (Rome) Scheduler enablement The patch gives out the details of the znver2 scheduler model. There are few improvements with respect to execution units, latencies and throughput when compared with znver1. The tests that were present for znver1 for llvm-mca tool were replicated. The latencies, execution units, timeline and throughput information are updated for znver2. Reviewers: craig.topper, Simon Pilgrim Differential Revision: https://reviews.llvm.org/D66088	2020-01-10 00:44:59 +05:30
Evandro Menezes	ff0f407e90	[MCA] Fix test cases (NFC) Fix the test cases for Exynos M5 that break under Darwin.	2019-11-22 16:19:58 -06:00
Evandro Menezes	48b7fe02a1	[AArch64] Add the pipeline model for Exynos M5 Add the scheduling and cost models for Exynos M5.	2019-11-22 15:09:17 -06:00
Eric Christopher	8259182e51	Revert "[AArch64] Add the pipeline model for Exynos M5" as it's causing test failures in llvm-mca. This reverts commit `9bdfee2a3b`.	2019-11-20 16:04:52 -08:00
Evandro Menezes	9bdfee2a3b	[AArch64] Add the pipeline model for Exynos M5 Add the scheduling and cost models for Exynos M5.	2019-11-20 16:56:07 -06:00
Simon Pilgrim	1786047b91	[X86] Fix SLM v2i64 ADD/Sub/CMPEQ instruction schedules Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2i64 ops. Numbers taken from Intel AOM (+ checked against Agner)	2019-11-06 19:08:15 +00:00
Simon Pilgrim	ad70d5f39a	[X86] Fix SLM v2f64 ADD/MUL + FP BLEND/HADD instruction schedules Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2f64/v2i64 ops.	2019-11-06 19:08:15 +00:00
Evandro Menezes	80c03fb5c2	[mca] Fix test case (NFC) Fix test case for Darwin builds.	2019-10-31 16:44:52 -05:00
Evandro Menezes	f9af4ccb8a	[AArch64] Update for Exynos Fix the costs of `add` and `orr` with an immediate operand.	2019-10-31 15:25:22 -05:00
Evandro Menezes	215da6606c	[clang][llvm] Obsolete Exynos M1 and M2	2019-10-30 15:02:59 -05:00
Andrea Di Biagio	b744abb4f6	[X86][BtVer2] Improved latency and throughput of float/vector loads and stores. This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765	2019-10-14 11:12:18 +00:00
Roman Lebedev	a5e65c1cf7	[MCA] Show aggregate over Average Wait times for the whole snippet (PR43219) Summary: As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219, i believe it may be somewhat useful to show //some// aggregates over all the sea of statistics provided. Example: ``` Average Wait times (based on the timeline view): [0]: Executions [1]: Average time spent waiting in a scheduler's queue [2]: Average time spent waiting in a scheduler's queue while ready [3]: Average time elapsed from WB until retire stage [0] [1] [2] [3] 0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2 1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3 2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4 3 3.2 0.3 2.3 <total> ``` I.e. we average the averages. Reviewers: andreadb, mattd, RKSimon Reviewed By: andreadb Subscribers: gbedwell, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68714 llvm-svn: 374361	2019-10-10 14:46:21 +00:00
Andrea Di Biagio	8d6651f7b1	[MCA][LSUnit] Track loads and stores until retirement. Before this patch, loads and stores were only tracked by their corresponding queues in the LSUnit from dispatch until execute stage. In practice we should be more conservative and assume that memory opcodes leave their queues at retirement stage. Basically, loads should leave the load queue only when they have completed and delivered their data. We conservatively assume that a load is completed when it is retired. Stores should be tracked by the store queue from dispatch until retirement. In practice, stores can only leave the store queue if their data can be written to the data cache. This is mostly a mechanical change. With this patch, the retire stage notifies the LSUnit when a memory instruction is retired. That would triggers the release of LDQ/STQ entries. The only visible change is in memory tests for the bdver2 model. That is because bdver2 is the only model that defines the load/store queue size. This patch partially addresses PR39830. Differential Revision: https://reviews.llvm.org/D68266 llvm-svn: 374034	2019-10-08 10:46:01 +00:00
David Green	9292983154	[llvm-mca] Add a -mattr flag This adds a -mattr flag to llvm-mca, for cases where the -mcpu option does not contain all optional features. Differential Revision: https://reviews.llvm.org/D68190 llvm-svn: 373358	2019-10-01 17:41:38 +00:00
Andrea Di Biagio	e0900f285b	[MCA] Improved cost computation for loop carried dependencies in the bottleneck analysis. This patch introduces a cut-off threshold for dependency edge frequences with the goal of simplifying the critical sequence computation. This patch also removes the cost normalization for loop carried dependencies. We didn't really need to artificially amplify the cost of loop-carried dependencies since it is already computed as the integral over time of the delay (in cycle). In the absence of backend stalls there is no need for computing a critical sequence. With this patch we early exit from the critical sequence computation if no bottleneck was reported during the simulation. llvm-svn: 372337	2019-09-19 16:05:11 +00:00
Andrea Di Biagio	528f68144b	[X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions. On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649	2019-09-02 12:32:28 +00:00
Andrea Di Biagio	8e9af64da6	[X86][BtVer2] Add a read-advance to every implicit register use of CMPXCHG8B/16B. This is a follow up of r369642. This patch assigns a ReadAfterLd to every implicit register use of instruction CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit registers are read after 3cy from the start of execution. llvm-svn: 369750	2019-08-23 12:19:45 +00:00
Andrea Di Biagio	1630f64e2f	[X86][BtVer2] Fix latency of ALU RMW instructions. Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed latency of all other RMW integer arithmetic/logic instructions is 6cy and not 5cy. Example (ADD): ``` addb $0, (%rsp) # Latency: 6cy addb $7, (%rsp) # Latency: 6cy addb %sil, (%rsp) # Latency: 6cy addw $0, (%rsp) # Latency: 6cy addw $511, (%rsp) # Latency: 6cy addw %si, (%rsp) # Latency: 6cy addl $0, (%rsp) # Latency: 6cy addl $511, (%rsp) # Latency: 6cy addl %esi, (%rsp) # Latency: 6cy addq $0, (%rsp) # Latency: 6cy addq $511, (%rsp) # Latency: 6cy addq %rsi, (%rsp) # Latency: 6cy ``` The same latency profile applies to SUB/AND/OR/XOR/INC/DEC. The observed latency of ADC/SBB is 7-8cy. So we need a different write to model those. Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower than what the model for btver2 currently reports). Differential Revision: https://reviews.llvm.org/D66636 llvm-svn: 369748	2019-08-23 11:34:10 +00:00

1 2 3 4 5 ...

627 Commits