llvm-project

Commit Graph

Author	SHA1	Message	Date
Yuta Mukai	3f561996bf	[AArch64] Fix and add A64FX scheduling resource/latency info 1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR) is added and CompleteModel is set to one. 2. Information for pseudo SVE instructions is added. Those instructions are present at the time of scheduling. 3. Resource and latency information for SVE instructions is modified to be more accurate. For example, the description for CMPEQ, which consumes one cycle each of unit FLA and PPR, is as follows. ``` Previous: def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>; def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {... Modified: def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>; def A64FXGI1 : ProcResGroup<[A64FXIPPR]>; def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {... ``` Reference: A64FX Microarchitecture Manual (Table 16-3) https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf Reviewed By: dmgreen, kawashima-fj Differential Revision: https://reviews.llvm.org/D131165	2022-08-09 10:53:40 +09:00
David Green	408378a0b3	[AArch64] Tone down the number of repeated fmov N2 scheduling tests. NFC	2022-08-05 08:11:57 +01:00
Cullen Rhodes	767b26a4e2	[MCA] Support multiple comma-separated -mattr features Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D129479	2022-07-12 08:20:11 +00:00
Cullen Rhodes	d1c51d45f0	[AArch64] Use Neoverse N2 sched model as default for: - Cortex-A710 - Cortex-X2 - Neoverse-V1 - Neoverse-512tvb Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D129203	2022-07-08 13:34:13 +00:00
Cullen Rhodes	03af9ba680	[AArch64] Initial sched model for Neoverse N2 The optimization guide can be found here: https://developer.arm.com/documentation/PJDOC-466751330-18256/latest/ Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128631	2022-07-08 09:39:13 +00:00
Jay Foad	d393538c7f	[AMDGPU] Add a GFX11 MCA test This mostly just tests that DPFP is 1/32 rate on GFX11, instead of 1/16 rate as on GFX10.	2022-06-14 13:47:29 +01:00
Simon Pilgrim	d384a4c530	[X86] Adjust vector test costs to match SoG (Issue #54889 ) znver1/2 models were incorrectly modelling the latency/throughput/uops and znver1 ymm variants also require double pumping. Now matches what I can decipher from the AMD SoG, Agner and instlatx64 numbers vs the llvm-exegesis report provided by @fabian-r	2022-05-31 09:14:06 +01:00
Simon Pilgrim	14cc4674bf	[X86] Adjust vector fp test costs to match int test costs znver1/2 models were missing the vtestps/pd overrides to match the vptest integer equivalents. Noticed while investigating Issue #54889	2022-05-30 09:50:15 +01:00
Simon Pilgrim	1956f28037	[X86] Adjust vector extend to ymm to match SoG (Issue #54889 ) znver1 ymm variants of VPMOVSX/VPMOVZX instructions require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @fabian-r for the report	2022-05-30 08:58:56 +01:00
Simon Pilgrim	c99690462e	[X86] Adjust vector shift costs to match SoG (Issue #54889 ) znver1/2 models were incorrectly modelling the fpupipe (should be pipe2 for shift-by-scalar-amount and pipe1 for shift-by-element-amount) and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @fabian-r for the report	2022-05-29 17:55:39 +01:00
Simon Pilgrim	896557e129	[X86] Adjust fadd costs to match SoG znver1/2 models were incorrectly modelling these on fpupipe 0 instead of 2/3 and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @fabian-r for the report	2022-05-15 21:28:29 +01:00
Simon Pilgrim	6824cf1ab7	[X86] Set some more plausible latencies for horizontal add/subs on znver1 These are all microcoded/multi-pipe nightmares on Ryzen, but we shouldn't just be using the WriteMicrocoded class which is for REALLY bad microcoded nightmares - instead use the same approximate latencies as znver2 (Agner and uops.info both suggest similar values) - and make sure we use the FPU defs for both Fixes #53242	2022-05-08 15:48:42 +01:00
Simon Pilgrim	c7662dc3e5	[X86] MOVDDUP has the same sched behaviour as MOVSHDUP/MOVSLDUP on Skylake Fixes an old TODO - confirmed on Agner + uops.info	2022-05-02 12:50:37 +01:00
Simon Pilgrim	a305d8f44e	[X86] Adjust fsetcc/fmin/fmax costs to match SoG (Issue #54889 ) znver1/2 models were incorrectly modelling these as 3 cycle latency instructions on the wrong pipe and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @fabian-r for the report	2022-04-14 13:27:33 +01:00
Simon Pilgrim	058a33d3c9	[X86] Account for high uop/resource usage in BSF/BSR instructions znver1/2 models were incorrectly modelling these as single uop instructions, instead of the microcoded nightmares they really are. Now matches AMD SoG, Agner and instlatx64 numbers. Fixes #54811	2022-04-11 11:20:09 +01:00
Simon Pilgrim	5626bd4289	[X86] Fix SLM scheduler model for PMULLD (PR37059) Adjust the PMULLD entry to match the Intel AoM numbers - PMULLD is a uop nightmare on SLM and we should model it as such. We had reports of internal regressions the last time this was attempted (rG13a0f83a05ff), but no public repros, and tests I did last year when I had access to a SLM box failed to see anything. My hunch is that the more aggressive PMULLD -> PMADDWD folds we now perform might have helped. We can revisit this again if we ever receive an actual repro. Fixes #36407	2022-04-08 10:07:06 +01:00
Simon Pilgrim	0c9c92ffc0	[X86][XOP] Tidyup VPHADD/VPHSUB unary horizontal ops default schedule class Based off Agner and AMD SoG tables, the XOP VPHADD/VPHSUB unary horizontal ops are as fast as basic arithmetic ops, not the slower SSSE3 binary horizontal add/sub ops. This also matches what the bdver2 model already lists. Noticed while investigating reduction add optimizations.	2022-03-03 12:07:48 +00:00
David Green	61b616755a	Partially revert "[SchedModels][CortexA55] Add ASIMD integer instructions" The Cortex-A55 scheduling model is used for -mcpu=generic, meaning it can have a wider effect than just the A55. The changes to the A55 scheduling model seems to have caused performance regressions on Cortex-A510 device which have latencies closer to the original and different forwarding paths. This partially reverts the changes from D117003, at least until we can do something to improve Cortex-A510. According to my results, this improves the A510 results without altering the A55 very much.	2022-02-28 10:58:52 +00:00
Pavel Kosov	37fa99eda0	[SchedModels][CortexA55] Add ASIMD integer instructions Depends on D114642 Original review https://reviews.llvm.org/D112201 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D117003	2022-02-17 13:41:57 +03:00
Craig Topper	56d6ccd4cb	[X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs Most Intel CPU scheduler files lumped the immediate and 1 instructions together, but uops.info shows they are quite different. For the most part the by 1 instructions were pretty accurate to the uops.info data except the latency was 3 instead of 2 as uops.info indicates. The by immediate instructions need 7 or 8 uops and have higher latency. It looks like the 8-bit by immediate instructions may need even more uops, but I just lumped them with the 16/32/64. Noticed while checking out PR53648. So mostly I cared about the by 1 instructions. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D119217	2022-02-08 09:20:20 -08:00
Simon Pilgrim	6eb8fc9244	[X86] Add some missing dependency-breaking zero idiom patterns to scheduler models Many of the x86 scheduler models are not accounting for their microarch's ability to handle dependency-breaking zero idioms (pxor xmm0,xmm0 etc.), which is causing some notable differences when comparing llvm-mca reports to iaca, uops.info etc. These are based on the Intel AoMs and Agner's docs which list the instructions handled on each cpu model - there may be more, although tbh the xor/pxor/xorps/xorpd are by far the most commonly encountered. Once this is in place we also need to review missing support for 'allones' idioms and reg-reg move elimination, but this needs fixing first. @lebedev.ri The Barcelona test changes are due to the cpu still being tagged as using the SandyBridge model, if/when you get back to D63628 these will need to be addressed. Based on an original patch by @andreadb (Andrea Di Biagio) Differential Revision: https://reviews.llvm.org/D117497	2022-01-19 11:29:33 +00:00
Simon Pilgrim	8ea579203d	[MCA][X86] Add missing zero-idioms test file coverage atom/slm have no/limited zero-idioms handling but we should test all the common instructions anyhow znver1/znver2 were just missing - I've copied the Haswell tests for consistent test coverage	2022-01-17 16:04:39 +00:00
Patrick Holland	85e6e748d4	[MCA] Switching from conservatively guessing which instructions are memory-barrier instructions to providing targets and developers a convenient way to explicitly declare which instructions are memory-barriers. Differential Revision: https://reviews.llvm.org/D116779	2022-01-11 13:50:14 -08:00
Pavel Kosov	34a91d7748	[SchedModels][CortexA55] Fix scheduling of FP loads Patch fixes scheduling of FP load instructions with pre/post increment adding WriteAdr for address operand. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116361 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg	2022-01-10 10:14:45 +03:00
Simon Pilgrim	a0a0eb192e	[X86] Use WriteVecMove scheduler classes for VPMOVM2* instructions These match the port behaviour of reg-reg predicated xmm/ymm/zmm moves Fixes #34958	2021-12-27 13:21:29 +00:00
Simon Pilgrim	29475e0286	[X86] Add scheduler classes for zmm vector reg-reg move instructions Basic zmm reg-reg moves (with predication) are more port limited than xmm/ymm moves, so we need to add a separate class for them. We still appear to be missing move-elimination patterns for most of the intel models, which looks to be one of the main diffs for basic codegen analysis between llvm-mca and uops.info Load/stores are a bit messier and might be better handled as overrides.	2021-12-27 12:13:42 +00:00
Simon Pilgrim	948ae472a6	[MCA][X86] Add AVX512 vector move instruction test coverage	2021-12-27 11:43:39 +00:00
Simon Pilgrim	67cce1ceee	[X86] Adjust some IceLake fp shuffle schedule classes (PR48110) The IceLake scheduler model is still mainly a copy of the SkylakeServer model. This patch adjusts the fp shuffle classes to account for most instructions now working on Port 1 as well as Port 5. This is based off Agner + uops.info as well as the PR48110 report. Differential Revision: https://reviews.llvm.org/D115752	2021-12-19 13:00:11 +00:00
Simon Pilgrim	74d1fc742a	[X86] Adjust some IceLake integer shuffle schedule classes (PR48110) The IceLake scheduler model is still mainly a copy of the SkylakeServer model. This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5. This is based off Agner + uops.info as well as the PR48110 report. Differential Revision: https://reviews.llvm.org/D115547	2021-12-14 18:56:13 +00:00
Simon Pilgrim	d9655eec05	[MCA][X86] Add AVX512 subvector broadcast instruction test coverage	2021-12-13 18:48:25 +00:00
Simon Pilgrim	fe1b5b56c6	[MCA][X86] Add AVX512 movddup/movshdup/movsldup instruction test coverage As noted on D115547	2021-12-13 18:04:56 +00:00
Simon Pilgrim	b04c646711	[MCA][X86] Add AVX512 broadcast instruction test coverage As noted on D115547	2021-12-13 17:45:16 +00:00
Simon Pilgrim	9ad5969b5e	[X86][Atom] Fix CVT uops + port usage Fix overrides to use both ports. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.	2021-12-12 22:57:53 +00:00
Simon Pilgrim	4c1d248397	[MCA][X86] Fix duplicated cvtsi2ss/cvtsi2sd i32 + i64 folded tests Specify the integer width to ensure we're testing the correct instruction	2021-12-12 22:48:45 +00:00
Simon Pilgrim	c02f9791c6	[X86][AVX512] Remove xmm->xmm vpmovsx/vpmovzx rm overrides The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX.Folded	2021-12-12 16:08:10 +00:00
Simon Pilgrim	fc02ceb12a	[X86][AVX512] Use WriteShuffleX for xmm->xmm extensions The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX	2021-12-12 15:22:32 +00:00
Simon Pilgrim	7f09aee0f6	[MCA][X86] Add missing VPMOVSX/VPMOVZX from AVX512 tests	2021-12-10 18:12:57 +00:00
Simon Pilgrim	6fae235885	[MCA][X86] Add missing ALIGND/ALIGNQ from AVX512F/AVX512VL tests	2021-12-10 15:59:52 +00:00
Simon Pilgrim	b025b062d6	[MCA][X86] Add missing PALIGNR from AVX512BW/AVX512BWVL tests	2021-12-10 15:59:52 +00:00
Simon Pilgrim	ebcc92ccda	[MCA][X86] Add missing PSLLDQ/PSRLDQ from AVX512BW/AVX512BWVL tests	2021-12-10 15:59:51 +00:00
Simon Pilgrim	550bf36732	[MCA][X86] Add missing PACKSS/PACKUS from AVX512BW/AVX512BWVL tests	2021-12-10 15:59:51 +00:00
Simon Pilgrim	80ce01c6fd	[MCA][X86] Add missing PSHUFLW from AVX512BWVL tests	2021-12-10 14:02:37 +00:00
Andrew Savonichev	420300c0d8	[MCA] Remove the warning about experimental support for in-order CPU There are not a lot of bug reports for this feature, so let's mark it stable. Differential Revision: https://reviews.llvm.org/D114701	2021-12-07 15:27:51 +03:00
Roman Lebedev	2f364f6f0d	[NFC][X86][MCA] Add forgotten test coverage for AVX512's VPMOVM2[BWDQ] / VPMOV[BWDQ]2M	2021-11-20 13:09:18 +03:00
Simon Pilgrim	0bb32b1b21	[X86][SLM] Fix BitTest+Set uops + port usage Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.	2021-10-17 18:13:15 +01:00
Simon Pilgrim	5ed5df4802	[X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructions Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Simon Pilgrim	680afaaa5d	[X86][SLM] Fix uops for PCLMULQDQ Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Simon Pilgrim	498c7236bc	[X86][SLM] +1uop for PSHUFBrm xmm Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Simon Pilgrim	7cae0daee6	[X86][Atom] Fix BSR/BSF uops + port usage Both ports are required for BitScan ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.	2021-10-02 19:09:44 +01:00
Simon Pilgrim	8e7f6039fa	[X86] Atom SSE shift-by-variable take 2uops/3uops not 1uop Based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.	2021-10-02 12:28:41 +01:00

1 2 3 4 5 ...

677 Commits