llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	f73334c46d	[AArch64] Set the latency of Cortex-A55 stores to 1 This sets the latency of stores to 1 in the Cortex-A55 scheduling model, to better match the values given in the software optimization guide. The latency of a store in normal llvm scheduling does not appear to have a lot of uses. If the store has no outputs then the latency is somewhat meaningless (and pre/post increment update operands use the WriteAdr write for those operands instead). The one place it does alter things is the latency between a store and the end of the scheduling region, which can in turn have an effect on the critical path length. As a result a latency of 1 is more correct and offers ever-so-slightly better scheduling of instructions near the end of the block. They are marked as RetireOOO to keep the llvm-mca from introducing stalls where non would exist. Differential Revision: https://reviews.llvm.org/D105541	2021-07-12 13:39:35 +01:00
Andrea Di Biagio	4fe0fcd1c0	[llvm-mca][JSON] Teach the PipelinePrinter how to deal with anonymous code regions (PR51008) This patch addresses the last remaining problems reported in PR51008. Previous fixes for PR51008 worked under the wrong assumption that code regions are always named (except maybe for the default region, which was automatically named "main"). In reality, it is quite common for users to declare multiple anonymous regions. So we cannot really use the region name as the key string of a JSON object. In practice, code region names are completely optional. Using "main" for the default region was also problematic because there can be another region with that same name. This patch fixes these issues by introducing a json::array of regions. Each region has a "Name" field, which would default to the empty string for anonymous regions. Added a few more tests to verify that the JSON file format is still valid, and that multiple anonymous regions all appear in the final output.	2021-07-10 13:57:52 +01:00
Andrea Di Biagio	d919bca875	[llvm-mca][JSON] Further refactoring of the JSON printing logic. This patch renames object "Resources" to "TargetInfo". Moved the getJSONTargetInfo method from class InstructionView to the PipelinePrinter. Removed uses of std::stringstream. Removed unused method View::printViewJSON().	2021-07-10 12:38:19 +01:00
Andrea Di Biagio	10cb036223	[llvm-mca] Refactor the logic that prints JSON files. Moved most of the printing logic into the PipelinePrinter. This patch also fixes the JSON output when flag -instruction-tables is specified.	2021-07-09 22:56:39 +01:00
Marcos Horro	b11d31eb73	[llvm-mca] Fix JSON format for multiple regions Instead of printing each region individually when using JSON format, this patch creates a JSON object which is updated with the values of each region, printing them at the end. New test is added for JSON output with multiple regions. Bug: https://bugs.llvm.org/show_bug.cgi?id=51008 Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D105618	2021-07-09 18:04:16 +02:00
Patrick Holland	d38b9f1f31	Revert "[MCA] [AMDGPU] Adding an implementation to AMDGPUCustomBehaviour for handling s_waitcnt instructions." Build failures when building with shared libraries. Reverting until I can fix. Differential Revision: https://reviews.llvm.org/D104730	2021-07-07 20:48:42 -07:00
Patrick Holland	af3baf1761	[MCA] [AMDGPU] Adding an implementation to AMDGPUCustomBehaviour for handling s_waitcnt instructions. This commit also makes some slight changes to the scheduling model for AMDGPU to set the RetireOOO flag for all scheduling classes. This flag is only used by llvm-mca and allows instructions to retire out of order. See the differential link below for a deeper explanation of everything. Differential Revision: https://reviews.llvm.org/D104730	2021-07-07 14:17:54 -07:00
Simon Pilgrim	ded8866f4a	[X86][Atom] Fix vector fp<->int resource/throughputs Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-07-07 16:52:34 +01:00
Marcos Horro	aa13e4fe7e	[llvm-mca] Fix JSON output (PR50922) Based on the discussion in PR50922, minor changes have been done to properly output a valid JSON. Removed "not implemented" keys. Differential Revision: https://reviews.llvm.org/D105064	2021-07-01 12:53:20 +01:00
Serge Pavlov	b36d214bed	[X86] Add description of FXAM instruction Previously this instruction could be used only in assembler. This change makes it available for compiler also. Scheduling information was copied from FTST instruction, hopefully this can be a satisfactory approximation. Differential Revision: https://reviews.llvm.org/D104853	2021-06-25 12:26:51 +07:00
Jay Foad	beebe5a056	[MCA] Allow unlimited cycles in the timeline view Change --max-timeline-cycles=0 to mean no limit on the number of cycles. Use this in AMDGPU tests to show all instructions in the timeline view instead of having it arbitrarily truncated. Differential Revision: https://reviews.llvm.org/D104846	2021-06-24 12:54:57 +01:00
Andrea Di Biagio	70b37f4c03	[MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725). When instructions are issued to the underlying pipeline resources, the mca::ResourceManager should also check for the presence of extra uses induced by the explicit consumption of multiple partially overlapping group resources. Fixes PR50725	2021-06-16 14:51:12 +01:00
Andrea Di Biagio	beb5213a2e	[MCA][InstrBuilder] Check for the presence of flag VariadicOpsAreDefs. This patch fixes the logic that checks for variadic register definitions, Before llvm-svn 348114 (commit `4cf35b4ab0`), it was not possible to explicitly mark variadic operands as definitions. By default, variadic operands of an MCInst were always assumed to be uses. A number of had-hoc checks were introduced in the InstrBuilder to fix the processing of variadic register operands of ARM ldm/stm variants. This patch simply replaces those old (and buggy) checks with a much simpler (and correct) check for MCID::Flag::VariadicOpsAreDefs.	2021-06-15 09:52:38 +01:00
Simon Pilgrim	630820bafc	[X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate. Match what's reported in the costs table, Agner's tables and the Intel AOM	2021-06-09 13:51:40 +01:00
Andrea Di Biagio	5f500d73cd	[MCA] Add a test for PR50483. NFC	2021-05-26 15:52:11 +01:00
Andrea Di Biagio	63cc9fd579	[MCA][InOrderIssueStage] Fix LastWriteBackCycle computation. Conservatively use the instruction latency to compute the last write-back cycle. Before this patch, the last write cycle computation was incorrect for store instructions that didn't declare any register writes.	2021-05-26 14:17:43 +01:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Simon Pilgrim	a26288e803	[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port. Discovered while investigating the correct fptoui costs to fix the regressions in D101555. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-20 18:56:58 +01:00
Andrea Di Biagio	9acabe8b6f	[MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit `e5d59db469`. This should unbreak buildbot clang-ppc64le-linux-lnt.	2021-05-19 19:12:33 +01:00
Patrick Holland	e5d59db469	[MCA] llvm-mca MCTargetStreamer segfault fix In order to create the code regions for llvm-mca to analyze, llvm-mca creates an AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions(). Within this function, both an MCAsmParser and MCTargetAsmParser are created so that MCAsmParser::Run() can be used to create the code regions for us. These parser classes were created for llvm-mc so they are designed to emit code with an MCStreamer and MCTargetStreamer that are expected to be setup and passed into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any code, an MCStreamerWrapper class gets created instead and passed into the MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides many of the emit methods to just do nothing. The exception is the emitInstruction() method which calls Regions.addInstruction(Inst). This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build our code regions, however there are a few directives which rely on the MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of those directives, a segfault will occur. In x86, each one of these 7 directives will cause this segfault if they exist in the input assembly to llvm-mca: .cv_fpo_proc .cv_fpo_setframe .cv_fpo_pushreg .cv_fpo_stackalloc .cv_fpo_stackalign .cv_fpo_endprologue .cv_fpo_endproc I haven’t looked at other targets, but I wouldn’t be surprised if some of the other ones also have certain directives which could result in this same segfault. My proposed solution is to simply initialize an MCTargetStreamer after we initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream object, but we don’t actually want any of these directives to be emitted anywhere, so I use an ostream created with the nulls() function. Since this needs to happen after the MCStreamerWrapper has been initialized, it needs to happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize within the main() function of llvm-mca. So this MCInstPrinter gets constructed within main() then passed into the parseCodeRegions() function as a parameter. (If you feel like it would be appropriate and possible to create the MCInstPrinter within the parseCodeRegions() function, then feel free to modify my solution. That would stop us from having to pass it into the function and would limit its scope / lifetime.) My solution stops the segfault from happening and still passes all of the current (expected) llvm-mca tests. I also added a new test for x86 that checks for this segfault on an input that includes one of the .cv_fpo directives (this test fails without my solution, but passes with it). As far as I can tell, all of the functions that I modified are only called from within llvm-mca so there shouldn’t be any worries about breaking other tools. Differential Revision: https://reviews.llvm.org/D102709	2021-05-19 18:36:10 +01:00
Simon Pilgrim	b14f9a1ebd	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Simon Pilgrim	f9b1208681	[X86][Atom] Fix vector integer multiplication resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits. Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.	2021-05-15 14:25:48 +01:00
Roman Lebedev	990e806b36	[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps While btver2 model states that this pattern is a zero-cycle zero-idiom on Jaguar, it does not appear to be the case on Znver3, here it measures as not being recognized as dep-breaking zero-idiom, let alone a zero-cycle one.	2021-05-14 20:23:05 +03:00
Roman Lebedev	1fc1c88704	[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:05 +03:00
Roman Lebedev	2f8572d8e2	[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	f8f7c765a0	[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	d2fb4bfba8	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	094b493a3a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	1c0ac0b0f2	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	26eeb6e650	[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	41a5dcdf87	[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	6733fe5c0d	[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:03 +03:00
Roman Lebedev	9e9c80c250	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	b6a0449b34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	128d9c6bbd	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	555e1d2987	[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	012417c980	[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	29c4f892fe	[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:02 +03:00
Roman Lebedev	0e20d1f0ef	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	14e48cf8ee	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	4673af527e	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	93f2642871	[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	7a45b96e04	[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	1ea8be214f	[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	bbd2117c34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	d08909d1cb	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	a6f5351443	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	ce22f53916	[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00

1 2 3 4 5 ...

587 Commits