llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	0e20d1f0ef	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	14e48cf8ee	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	4673af527e	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	93f2642871	[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	7a45b96e04	[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	1ea8be214f	[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	bbd2117c34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	d08909d1cb	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	a6f5351443	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	ce22f53916	[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	44c2b4fe91	[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	a72cacb53f	[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	9acc589e5a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	a3617138c2	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	3f235a0b84	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	1d73c2b8cf	[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	31669b5073	[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	498bf365f4	[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	3009f8a383	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	d58d020b6c	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	0f7a595095	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	4af4afe014	[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	17f99a8a41	[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	38ceb46fb0	[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	3221e06e9b	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	0b7e52e725	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	055fa84cd8	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	d8a595b81c	[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	fd4cbc822b	[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	f38dcbecb6	[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps. As confirmed by the exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	c79c7bb980	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a57006d627	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a657808948	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	43a7f130a7	[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	336b9dbe88	[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	9c596bc541	[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom Same as with it's float friend, unlike their AVX versions. As confirmed by exegesis, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	3567c7eda1	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests	2021-05-14 11:56:07 +03:00
Roman Lebedev	57eee56d0a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	fdc65e46b6	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	59554c01ab	[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	2a7c52ff7f	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	26c1bffe67	[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle, this one is certainly a zero-cycle instruction, in addition to both of them being dependency breaking. As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	a9fb321a67	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	aa0dcb3ba4	[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom While both the SOG and Agner insist that it is zero-cycle, i can not confirm that claim. While it clearly breaks the dependency, i can not come up with a snippet, or measurement approach, to end up with IPC bigger than 4, which, to me, means that it actually consumes execution resource of an FP unit for a cycle.	2021-05-14 00:03:36 +03:00
Roman Lebedev	6c4596793d	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test	2021-05-14 00:03:36 +03:00
Roman Lebedev	6a64c462eb	[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Still not zero-cycle :)	2021-05-10 23:49:27 +03:00
Roman Lebedev	5864e7b86b	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX YMM VPCMP	2021-05-10 23:49:27 +03:00
Roman Lebedev	2953245337	[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Again, it's not zero-cycle.	2021-05-10 23:49:26 +03:00
Roman Lebedev	f59db6c4f8	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX XMM VPCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	0f3bcb97ef	[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Much like with MMX PCMP, it does actually have to execute, though.	2021-05-10 23:49:26 +03:00
Roman Lebedev	0e538f937a	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg XMM SSE PCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	b24edfff4f	[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom They are, however, not zero-cycle, and do actually execute. As measured by exegesis, and confirmed by ref docs.	2021-05-10 23:49:26 +03:00
Roman Lebedev	ba225ce961	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg MMX PCMPEQ	2021-05-10 23:49:25 +03:00
Roman Lebedev	08cf2776ac	[X86] AMD Zen 3: sub-32-bit CMP also break dependencies They measure as having the same effect as 32-bit CMP.	2021-05-10 20:57:38 +03:00
Roman Lebedev	ecff974b66	[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking	2021-05-10 20:57:37 +03:00
Roman Lebedev	be23d5e814	[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction As measured by exegesis, and confirmed by ref docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	9a31efa2f5	[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	11b0568dce	[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction As confirmed by exegesis measurements, and ref docs. It does actually execute. While there, bump latency for MULX32rr, that seems to match measurements.	2021-05-10 00:03:20 +03:00
Roman Lebedev	8d0e2d2b0f	[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	eed8552787	[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms As measured by exegesis and confirmed in reference docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	ab794852ed	[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests	2021-05-10 00:03:20 +03:00
Roman Lebedev	a21df76db6	[X86] AMD Zen 3: XCHG is a zero-cycle instruction As measured by exegesis and confirmed by reference docs.	2021-05-09 20:37:57 +03:00
Roman Lebedev	2819009b5a	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	a8e30e63ac	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Roman Lebedev	34de155f7e	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	758c173309	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	715c0d0bd4	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	ee020b930d	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	9db4203883	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	0d961fbd52	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	bcbfc22ff9	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	cbabe4f4d6	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	d8c6202576	[X86] AMD Zen 3: throughput for renameable GPR moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:43 +03:00
Roman Lebedev	e6d688ec96	[NFC][X86][MCA] Increase iteration count in reg move elimination tests So the IPC actually stabilizes at 6.	2021-05-07 17:06:43 +03:00
Roman Lebedev	bda9ca3e44	[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves In Zen3, MMX moves are not eliminated, i've verified this with llvm-exegesis.	2021-05-07 13:56:07 +03:00
Roman Lebedev	7059b28d5d	[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 13:56:07 +03:00
Roman Lebedev	227678089c	[NFC][X86][MCA] AMD Zen 3: add tests with eliminatible GPR moves	2021-05-07 13:56:07 +03:00
Roman Lebedev	2b93c9c16c	[X86] AMD Zen 3 Scheduler Model Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`. This is fully built from scratch, from llvm-mca measurements and documented reference materials. Nothing was copied from `znver2`/`znver1`. I believe this is in a reasonable state of completion for inclusion, probably better than D52779 `bdver2` was :) Namely: * uops are pretty spot-on (at least what llvm-mca can measure) {F16422596} * latency is also pretty spot-on (at least what llvm-mca can measure) {F16422601} * throughput is within reason {F16422607} I haven't run much benchmarks with this, however RawSpeed benchmarks says this is beneficial: {F16603978} {F16604029} I'll call out the obvious problems there: * i didn't really bother with X87 instructions * i didn't really bother with obviously-microcoded/system instructions * There are large discrepancy in throughput for `mr` and `rm` instructions. I'm not really sure if it's a modelling defect that needs to be fixed, or it's a defect of measurments. * Pipe distributions are probably bad :) I can't do much here until AMD allows that to be fixed by documenting the appropriate counters and updating libpfm That being said, as @RKSimon notes: >>! In D94395#2647381, @RKSimon wrote: > I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...> so how much worse this could possibly be?! Things that aren't there: * Various tunings: zero idioms, etc. That is follow-ups. Differential Revision: https://reviews.llvm.org/D94395	2021-05-01 22:08:13 +03:00
Andrea Di Biagio	8bd4f3d547	[MCA] Fix CarryOver check in the DispatchStage (PR50174). Early exit from method DispatchStage::isAvailable() if the dispatch group is already full. Not all instructions declare at least one uOP. Fixes PR50174.	2021-04-30 14:26:46 +01:00
Sebastian Neubauer	4897effb14	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Andrew Savonichev	f08a2fc09e	[MCA] Add tests for IPC on Cortex-A55 The tests compare IPC statistics that MCA provides with IPC values measured on Cortex-A55 hardware. For hardware tests, each snippet is run in a loop unrolled by 1000, and IPC is measured by linux-perf. Several tests do not match the hardware: the skewed ALU is not supported, LDR seem to be missing a forwarding path. Differential Revision: https://reviews.llvm.org/D98174	2021-04-08 19:37:07 +03:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Andrea Di Biagio	f5bdc88e4d	[MCA] Improved handling of negative read-advance cycles. Before this patch, register writes were always invalidated by the RegisterFile at instruction commit stage. So, the RegisterFile was often losing the knowledge about the `execute cycle` of writes already committed. While this was not problematic for non-delayed reads, this was sometimes leading to inaccurate read latency computations in the presence of negative read-advance cycles. This patch fixes the issue by changing how the RegisterFile component internally keeps track of the `execute cycle` information of each write. On every instruction executed, the RegisterFile gets notified by the RetireStage, so that it can internally record the execute cycle of each executed write. The `execute cycle` information is stored within WriteRef itself, and it is not invalidated when the write is committed.	2021-03-23 14:47:23 +00:00
Andrew Savonichev	e6ce0db378	[MCA] Ensure that writes occur in-order Delay the issue of a new instruction if that leads to out-of-order commits of writes. This patch fixes the problem described in: https://bugs.llvm.org/show_bug.cgi?id=41796#c3 Differential Revision: https://reviews.llvm.org/D98604	2021-03-18 17:10:20 +03:00
Jay Foad	7340fd6886	[MCA] Support in-order CPUs with MicroOpBufferSize=1 Differential Revision: https://reviews.llvm.org/D98356	2021-03-11 10:12:54 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
Abhina Sreeskantharajan	42a21778f6	[test] Use host platform specific error message substitution in lit tests On z/OS, the following error message is not matched correctly in lit tests. ``` EDC5129I No such file or directory. ``` This patch uses a lit config substitution to check for platform specific error messages. Reviewed By: muiez, jhenderson Differential Revision: https://reviews.llvm.org/D95246	2021-01-29 07:16:30 -05:00
Abhina Sreeskantharajan	978444d531	Revert "[SystemZ][z/OS] Fix No such file or directory expression error" This reverts commit `06f8a49693`.	2021-01-25 08:29:38 -05:00
Wolfgang Pieb	7143b63017	[llvm-mca] Adding local lit config file for X86 targets	2021-01-22 09:52:57 -08:00
Wolfgang Pieb	020c00b5d3	[llvm-mca] Test case was missing a triple.	2021-01-21 16:19:32 -08:00
Wolfgang Pieb	d38be2ba0e	[llvm-mca] Initial implementation of serialization using JSON. The views implemented at this time are Summary, Timeline, ResourcePressure and InstructionInfo. Use --json on the command line to obtain JSON output.	2021-01-21 15:15:54 -08:00
Abhina Sreeskantharajan	689aaba7ac	[SystemZ][z/OS] Fix No such file or directory expression error matching in lit tests On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully. ``` EDC5129I No such file or directory. ``` Reviewed By: muiez Differential Revision: https://reviews.llvm.org/D94239	2021-01-18 07:14:37 -05:00
David Green	6c89f6fae4	[AArch64] Attempt to fix Mac tests with a more specific triple. NFC	2021-01-04 11:29:18 +00:00
Usman Nadeem	685c8b537a	[AARCH64] Improve accumulator forwarding for Cortex-A57 model The old CPU model only had MLA->MLA forwarding. I added some missing MUL->MLA read advances and a missing absolute diff accumulator read advance according to the Cortex A57 Software Optimization Guide. The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and spec2006/milc by 8% (repeated runs on multiple devices), causes no significant regressions (none in SPEC). Differential Revision: https://reviews.llvm.org/D92296	2021-01-04 10:58:43 +00:00
Craig Topper	0cbceed27c	[TableGen][ARM][X86] Detect combining IntrReadMem and IntrWriteMem. These properties aren't additive. They are closer to ReadOnly and WriteOnly. The default is ReadWrite. ReadMem cancels the write property and WriteMem cancels the read property. Combining them leaves neither. This patch checks that when we process WriteMem, the Mod flag is still set. And for ReadMem we check that the Ref flag set still set. I've updated 2 target intrinsics that were combining these properties. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93571	2020-12-19 14:56:17 -08:00
Craig Topper	f47b07315a	[X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands These instructions read their inputs from fixed registers rather than using a modrm byte. We shouldn't require the user to list them when parsing assembly. This matches the GNU assembler. This patch adds InstAliases so we can accept either form. It also changes the printing code to use the form without registers. This will change the behavior of llvm-objdump, but should be consistent with binutils objdump. This also matches what we already do in LLVM for clzero and monitorx which also used fixed registers. I need to add and improve tests before this can be commited. The disassembler tests exist, but weren't checking the fixed register so they pass before and after this change. Fixes https://github.com/ClangBuiltLinux/linux/issues/1216 Differential Revision: https://reviews.llvm.org/D93524	2020-12-19 11:01:55 -08:00
Sjoerd Meijer	630d37dc1b	[AArch64] Enable Cortex-A55 schedmodel The model was committed in `4b8ade837e` but not yet enabled to allow for a few fix ups. This adds a few of these fixes, and also a LLVM MCA test to check most instructions. While I do have plans to look into some more tuning, it's time to enable this as it better than using the A53 schedule. Differential Revision: https://reviews.llvm.org/D88017	2020-11-30 19:28:34 +00:00

1 2 3 4 5 ...

597 Commits