llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	ded8866f4a	[X86][Atom] Fix vector fp<->int resource/throughputs Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-07-07 16:52:34 +01:00
Serge Pavlov	b36d214bed	[X86] Add description of FXAM instruction Previously this instruction could be used only in assembler. This change makes it available for compiler also. Scheduling information was copied from FTST instruction, hopefully this can be a satisfactory approximation. Differential Revision: https://reviews.llvm.org/D104853	2021-06-25 12:26:51 +07:00
Andrea Di Biagio	70b37f4c03	[MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725). When instructions are issued to the underlying pipeline resources, the mca::ResourceManager should also check for the presence of extra uses induced by the explicit consumption of multiple partially overlapping group resources. Fixes PR50725	2021-06-16 14:51:12 +01:00
Simon Pilgrim	630820bafc	[X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate. Match what's reported in the costs table, Agner's tables and the Intel AOM	2021-06-09 13:51:40 +01:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Simon Pilgrim	a26288e803	[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port. Discovered while investigating the correct fptoui costs to fix the regressions in D101555. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-20 18:56:58 +01:00
Andrea Di Biagio	9acabe8b6f	[MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit `e5d59db469`. This should unbreak buildbot clang-ppc64le-linux-lnt.	2021-05-19 19:12:33 +01:00
Patrick Holland	e5d59db469	[MCA] llvm-mca MCTargetStreamer segfault fix In order to create the code regions for llvm-mca to analyze, llvm-mca creates an AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions(). Within this function, both an MCAsmParser and MCTargetAsmParser are created so that MCAsmParser::Run() can be used to create the code regions for us. These parser classes were created for llvm-mc so they are designed to emit code with an MCStreamer and MCTargetStreamer that are expected to be setup and passed into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any code, an MCStreamerWrapper class gets created instead and passed into the MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides many of the emit methods to just do nothing. The exception is the emitInstruction() method which calls Regions.addInstruction(Inst). This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build our code regions, however there are a few directives which rely on the MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of those directives, a segfault will occur. In x86, each one of these 7 directives will cause this segfault if they exist in the input assembly to llvm-mca: .cv_fpo_proc .cv_fpo_setframe .cv_fpo_pushreg .cv_fpo_stackalloc .cv_fpo_stackalign .cv_fpo_endprologue .cv_fpo_endproc I haven’t looked at other targets, but I wouldn’t be surprised if some of the other ones also have certain directives which could result in this same segfault. My proposed solution is to simply initialize an MCTargetStreamer after we initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream object, but we don’t actually want any of these directives to be emitted anywhere, so I use an ostream created with the nulls() function. Since this needs to happen after the MCStreamerWrapper has been initialized, it needs to happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize within the main() function of llvm-mca. So this MCInstPrinter gets constructed within main() then passed into the parseCodeRegions() function as a parameter. (If you feel like it would be appropriate and possible to create the MCInstPrinter within the parseCodeRegions() function, then feel free to modify my solution. That would stop us from having to pass it into the function and would limit its scope / lifetime.) My solution stops the segfault from happening and still passes all of the current (expected) llvm-mca tests. I also added a new test for x86 that checks for this segfault on an input that includes one of the .cv_fpo directives (this test fails without my solution, but passes with it). As far as I can tell, all of the functions that I modified are only called from within llvm-mca so there shouldn’t be any worries about breaking other tools. Differential Revision: https://reviews.llvm.org/D102709	2021-05-19 18:36:10 +01:00
Simon Pilgrim	b14f9a1ebd	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Simon Pilgrim	f9b1208681	[X86][Atom] Fix vector integer multiplication resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits. Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.	2021-05-15 14:25:48 +01:00
Roman Lebedev	990e806b36	[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps While btver2 model states that this pattern is a zero-cycle zero-idiom on Jaguar, it does not appear to be the case on Znver3, here it measures as not being recognized as dep-breaking zero-idiom, let alone a zero-cycle one.	2021-05-14 20:23:05 +03:00
Roman Lebedev	1fc1c88704	[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:05 +03:00
Roman Lebedev	2f8572d8e2	[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	f8f7c765a0	[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	d2fb4bfba8	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	094b493a3a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	1c0ac0b0f2	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	26eeb6e650	[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	41a5dcdf87	[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	6733fe5c0d	[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:03 +03:00
Roman Lebedev	9e9c80c250	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	b6a0449b34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	128d9c6bbd	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	555e1d2987	[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	012417c980	[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	29c4f892fe	[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:02 +03:00
Roman Lebedev	0e20d1f0ef	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	14e48cf8ee	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	4673af527e	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	93f2642871	[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	7a45b96e04	[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	1ea8be214f	[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	bbd2117c34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	d08909d1cb	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	a6f5351443	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	ce22f53916	[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	44c2b4fe91	[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	a72cacb53f	[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	9acc589e5a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	a3617138c2	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	3f235a0b84	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	1d73c2b8cf	[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	31669b5073	[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	498bf365f4	[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	3009f8a383	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	d58d020b6c	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	0f7a595095	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	4af4afe014	[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00

1 2 3 4 5 ...

489 Commits