llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	beebe5a056	[MCA] Allow unlimited cycles in the timeline view Change --max-timeline-cycles=0 to mean no limit on the number of cycles. Use this in AMDGPU tests to show all instructions in the timeline view instead of having it arbitrarily truncated. Differential Revision: https://reviews.llvm.org/D104846	2021-06-24 12:54:57 +01:00
Andrea Di Biagio	70b37f4c03	[MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725). When instructions are issued to the underlying pipeline resources, the mca::ResourceManager should also check for the presence of extra uses induced by the explicit consumption of multiple partially overlapping group resources. Fixes PR50725	2021-06-16 14:51:12 +01:00
Andrea Di Biagio	beb5213a2e	[MCA][InstrBuilder] Check for the presence of flag VariadicOpsAreDefs. This patch fixes the logic that checks for variadic register definitions, Before llvm-svn 348114 (commit `4cf35b4ab0`), it was not possible to explicitly mark variadic operands as definitions. By default, variadic operands of an MCInst were always assumed to be uses. A number of had-hoc checks were introduced in the InstrBuilder to fix the processing of variadic register operands of ARM ldm/stm variants. This patch simply replaces those old (and buggy) checks with a much simpler (and correct) check for MCID::Flag::VariadicOpsAreDefs.	2021-06-15 09:52:38 +01:00
Simon Pilgrim	630820bafc	[X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate. Match what's reported in the costs table, Agner's tables and the Intel AOM	2021-06-09 13:51:40 +01:00
Andrea Di Biagio	5f500d73cd	[MCA] Add a test for PR50483. NFC	2021-05-26 15:52:11 +01:00
Andrea Di Biagio	63cc9fd579	[MCA][InOrderIssueStage] Fix LastWriteBackCycle computation. Conservatively use the instruction latency to compute the last write-back cycle. Before this patch, the last write cycle computation was incorrect for store instructions that didn't declare any register writes.	2021-05-26 14:17:43 +01:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Simon Pilgrim	a26288e803	[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port. Discovered while investigating the correct fptoui costs to fix the regressions in D101555. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-20 18:56:58 +01:00
Andrea Di Biagio	9acabe8b6f	[MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit `e5d59db469`. This should unbreak buildbot clang-ppc64le-linux-lnt.	2021-05-19 19:12:33 +01:00
Patrick Holland	e5d59db469	[MCA] llvm-mca MCTargetStreamer segfault fix In order to create the code regions for llvm-mca to analyze, llvm-mca creates an AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions(). Within this function, both an MCAsmParser and MCTargetAsmParser are created so that MCAsmParser::Run() can be used to create the code regions for us. These parser classes were created for llvm-mc so they are designed to emit code with an MCStreamer and MCTargetStreamer that are expected to be setup and passed into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any code, an MCStreamerWrapper class gets created instead and passed into the MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides many of the emit methods to just do nothing. The exception is the emitInstruction() method which calls Regions.addInstruction(Inst). This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build our code regions, however there are a few directives which rely on the MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of those directives, a segfault will occur. In x86, each one of these 7 directives will cause this segfault if they exist in the input assembly to llvm-mca: .cv_fpo_proc .cv_fpo_setframe .cv_fpo_pushreg .cv_fpo_stackalloc .cv_fpo_stackalign .cv_fpo_endprologue .cv_fpo_endproc I haven’t looked at other targets, but I wouldn’t be surprised if some of the other ones also have certain directives which could result in this same segfault. My proposed solution is to simply initialize an MCTargetStreamer after we initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream object, but we don’t actually want any of these directives to be emitted anywhere, so I use an ostream created with the nulls() function. Since this needs to happen after the MCStreamerWrapper has been initialized, it needs to happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize within the main() function of llvm-mca. So this MCInstPrinter gets constructed within main() then passed into the parseCodeRegions() function as a parameter. (If you feel like it would be appropriate and possible to create the MCInstPrinter within the parseCodeRegions() function, then feel free to modify my solution. That would stop us from having to pass it into the function and would limit its scope / lifetime.) My solution stops the segfault from happening and still passes all of the current (expected) llvm-mca tests. I also added a new test for x86 that checks for this segfault on an input that includes one of the .cv_fpo directives (this test fails without my solution, but passes with it). As far as I can tell, all of the functions that I modified are only called from within llvm-mca so there shouldn’t be any worries about breaking other tools. Differential Revision: https://reviews.llvm.org/D102709	2021-05-19 18:36:10 +01:00
Simon Pilgrim	b14f9a1ebd	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Simon Pilgrim	f9b1208681	[X86][Atom] Fix vector integer multiplication resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits. Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.	2021-05-15 14:25:48 +01:00
Roman Lebedev	990e806b36	[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps While btver2 model states that this pattern is a zero-cycle zero-idiom on Jaguar, it does not appear to be the case on Znver3, here it measures as not being recognized as dep-breaking zero-idiom, let alone a zero-cycle one.	2021-05-14 20:23:05 +03:00
Roman Lebedev	1fc1c88704	[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:05 +03:00
Roman Lebedev	2f8572d8e2	[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	f8f7c765a0	[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	d2fb4bfba8	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	094b493a3a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	1c0ac0b0f2	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	26eeb6e650	[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	41a5dcdf87	[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	6733fe5c0d	[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:03 +03:00
Roman Lebedev	9e9c80c250	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	b6a0449b34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	128d9c6bbd	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	555e1d2987	[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	012417c980	[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	29c4f892fe	[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:02 +03:00
Roman Lebedev	0e20d1f0ef	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	14e48cf8ee	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	4673af527e	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	93f2642871	[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	7a45b96e04	[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	1ea8be214f	[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	bbd2117c34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	d08909d1cb	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	a6f5351443	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	ce22f53916	[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	44c2b4fe91	[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	a72cacb53f	[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	9acc589e5a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	a3617138c2	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	3f235a0b84	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	1d73c2b8cf	[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	31669b5073	[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	498bf365f4	[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	3009f8a383	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	d58d020b6c	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	0f7a595095	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	4af4afe014	[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	17f99a8a41	[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	38ceb46fb0	[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	3221e06e9b	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	0b7e52e725	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	055fa84cd8	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	d8a595b81c	[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	fd4cbc822b	[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	f38dcbecb6	[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps. As confirmed by the exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	c79c7bb980	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a57006d627	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a657808948	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	43a7f130a7	[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	336b9dbe88	[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	9c596bc541	[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom Same as with it's float friend, unlike their AVX versions. As confirmed by exegesis, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	3567c7eda1	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests	2021-05-14 11:56:07 +03:00
Roman Lebedev	57eee56d0a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	fdc65e46b6	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	59554c01ab	[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	2a7c52ff7f	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	26c1bffe67	[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle, this one is certainly a zero-cycle instruction, in addition to both of them being dependency breaking. As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	a9fb321a67	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	aa0dcb3ba4	[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom While both the SOG and Agner insist that it is zero-cycle, i can not confirm that claim. While it clearly breaks the dependency, i can not come up with a snippet, or measurement approach, to end up with IPC bigger than 4, which, to me, means that it actually consumes execution resource of an FP unit for a cycle.	2021-05-14 00:03:36 +03:00
Roman Lebedev	6c4596793d	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test	2021-05-14 00:03:36 +03:00
Roman Lebedev	6a64c462eb	[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Still not zero-cycle :)	2021-05-10 23:49:27 +03:00
Roman Lebedev	5864e7b86b	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX YMM VPCMP	2021-05-10 23:49:27 +03:00
Roman Lebedev	2953245337	[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Again, it's not zero-cycle.	2021-05-10 23:49:26 +03:00
Roman Lebedev	f59db6c4f8	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX XMM VPCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	0f3bcb97ef	[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Much like with MMX PCMP, it does actually have to execute, though.	2021-05-10 23:49:26 +03:00
Roman Lebedev	0e538f937a	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg XMM SSE PCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	b24edfff4f	[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom They are, however, not zero-cycle, and do actually execute. As measured by exegesis, and confirmed by ref docs.	2021-05-10 23:49:26 +03:00
Roman Lebedev	ba225ce961	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg MMX PCMPEQ	2021-05-10 23:49:25 +03:00
Roman Lebedev	08cf2776ac	[X86] AMD Zen 3: sub-32-bit CMP also break dependencies They measure as having the same effect as 32-bit CMP.	2021-05-10 20:57:38 +03:00
Roman Lebedev	ecff974b66	[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking	2021-05-10 20:57:37 +03:00
Roman Lebedev	be23d5e814	[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction As measured by exegesis, and confirmed by ref docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	9a31efa2f5	[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	11b0568dce	[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction As confirmed by exegesis measurements, and ref docs. It does actually execute. While there, bump latency for MULX32rr, that seems to match measurements.	2021-05-10 00:03:20 +03:00
Roman Lebedev	8d0e2d2b0f	[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	eed8552787	[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms As measured by exegesis and confirmed in reference docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	ab794852ed	[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests	2021-05-10 00:03:20 +03:00
Roman Lebedev	a21df76db6	[X86] AMD Zen 3: XCHG is a zero-cycle instruction As measured by exegesis and confirmed by reference docs.	2021-05-09 20:37:57 +03:00
Roman Lebedev	2819009b5a	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	a8e30e63ac	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Roman Lebedev	34de155f7e	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	758c173309	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	715c0d0bd4	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	ee020b930d	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	9db4203883	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	0d961fbd52	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00

1 2 3 4 5 ...

627 Commits