llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f51170bffd	[X86] Fix SLM ldmxcsr/stmxcsr schedule classes Fix a long standing FIXME comment using a mixture of llvm-exegesis and Agner numbers	2022-11-28 17:43:17 +00:00
Simon Pilgrim	c65d5d4aec	[X86] Remove unnecessary (V)?PBLENDW(Y)?rm overrides The znver1/znver2 overrides shouldn't need 2uops for the xmm case (but znver1 should double-pump for the ymm case). Found with the help of D138359	2022-11-28 16:32:55 +00:00
Simon Pilgrim	026df9514e	[X86] Remove unnecessary VBLENDWYrr overrides The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping. Found with the help of D138359	2022-11-27 16:54:47 +00:00
Simon Pilgrim	2285ba9acc	[X86] Fix uops counts for SLM extract/extract-store instructions Matches Intel AoM + Agner	2022-11-27 16:16:36 +00:00
Simon Pilgrim	746cf4f13f	[X86] Synchronise scheduler classes of VPERM2F128/VBROADCASTF128/VEXTRACTF128/VINSERTF128 with I128 equivalents znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides. Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.	2022-11-21 17:15:47 +00:00
Simon Pilgrim	89365b159e	[X86] IceLakeServer - PACKS instructions take latency 3cy This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64 Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect	2022-11-20 19:28:35 +00:00
Simon Pilgrim	7de156d1cc	[MCA][X86] Add missing test coverage for BWI instructions	2022-11-20 17:19:58 +00:00
Simon Pilgrim	421bdc119a	[MCA][X86] Add test coverage for IFMA instructions	2022-11-20 17:19:58 +00:00
Simon Pilgrim	6a8fabf5c3	[MCA][X86] Add test coverage for XSAVE instructions	2022-11-20 13:56:04 +00:00
Simon Pilgrim	9148aeac00	[X86] Remove unnecessary string instruction overrides from znver1/znver2 models Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake	2022-11-20 12:57:44 +00:00
Simon Pilgrim	357f1c4ef1	[X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).	2022-11-20 12:13:02 +00:00
Simon Pilgrim	420d02bb55	[MCA][X86] Add test coverage for LOOP/LOOPE/LOOPNE instructions These were missed for some reason - only noticed this while investigating a FIXME in the SandyBridge model Also sync the znver2/znver3 tests which had been missed when LOCK test coverage was added	2022-11-20 11:35:21 +00:00
Simon Pilgrim	13fd7373b6	[X86] znver2 - (V)EXTRACTPSrr takes 2 uops D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)	2022-11-20 09:24:55 +00:00
Simon Pilgrim	474e41f1b9	[MCA][X86] Add test coverage for BF16 instructions	2022-11-19 21:46:23 +00:00
Simon Pilgrim	ba5714d773	[MCA][X86] Add test coverage for VP2INTERSECT instructions NOTE: For IceLakeServer we actually test TigerLake as that's the only target that supports it (we do something similar for F16C on IvyBridge in the SandyBridge tests).	2022-11-19 21:46:23 +00:00
Simon Pilgrim	420d0d3aa6	[MCA][X86] Add test coverage for VAES instructions	2022-11-19 21:02:19 +00:00
Simon Pilgrim	aae08b1d37	[MCA][X86] Add test coverage for BITALG instructions	2022-11-19 12:04:45 +00:00
Simon Pilgrim	91deae999a	[MCA][X86] Add test coverage for VPCLMULQDQ instructions	2022-11-18 21:22:10 +00:00
Simon Pilgrim	ffe05b8f57	[MCA][X86] Add missing IceLake test coverage for VPOPCNTDQ instructions	2022-11-18 20:58:29 +00:00
Simon Pilgrim	4c854120c2	[MCA][X86] Add test coverage for AVX512CD instructions	2022-11-18 20:58:29 +00:00
Simon Pilgrim	c6a838e9c8	[MCA][X86] Add test coverage for VBMI instructions	2022-11-16 16:58:26 +00:00
Simon Pilgrim	896271dbea	[MCA][X86] Ensure the avx512 gfni tests use the upper xmm/ymm registers Ensure we're testing the avx512vl gfni instructions and not the avx gfni instructions	2022-11-15 11:06:59 +00:00
Simon Pilgrim	7e78685752	[MCA][X86] Ensure the avx512 vnni tests use the upper xmm/ymm registers Ensure we're testing the avx512vl vnni instructions and not the avx vnni instructions	2022-11-14 16:29:31 +00:00
Simon Pilgrim	d7208b0404	[MCA][X86] Add test coverage for VBMI2 instructions	2022-11-14 16:29:31 +00:00
Simon Pilgrim	e5120a43d5	[X86] Update WriteMPSAD class and remove VMPSADBWrri override AMD 15h SoG + Agner both indicate there's no difference between MPSADBWrri + VMPSADBWrri - I can't find any data on the folded variant so I've kept the existing numbers Removes the last X86 override for WriteMPSAD/WritePSADBW classes - removing a further 3 entries from every sched class table	2022-11-13 15:19:37 +00:00
Simon Pilgrim	6a99f23845	[MCA][X86] Add test coverage for VDBPSADBW instructions	2022-11-13 15:19:36 +00:00
Simon Pilgrim	313a4aef7f	[X86] Fix scheduler tag for GFNI YMM instructions These were hardcoded to XMM width	2022-11-13 14:10:09 +00:00
Simon Pilgrim	e19cb9c57f	[X86] Cleanup CVTPD2PS schedule values The znver1/znver2 schedules for CVTPD2PS were incorrectly double pumping the xmm-load variant instead of the ymm variants (znver1 only) Also, the xmm-load variant was incorrectly using FP03 instead of just FP3 Confirmed by the AMD SoG 17h tables, Agner + uops.info Another step towards removing a lot of unnecessary overrides from all the x86 scheduler models - these should hopefully be convertible into regular WriteCvtPD2I classes soon.	2022-11-13 11:13:30 +00:00
Simon Pilgrim	4a28b7ba98	[X86] IceLakeModel - conversion instructions don't use Port015 Fixes a lot of throughput mismatches - the more complicated conversion instructions use ICXPort5+ICXPort01, not ICXPort5+ICXPort015 (ICXPort015 is mainly used for basic Logic + blend ops) Fixing this should allow us to remove a lot of unnecessary scheduler overrides from IceLakeModel Confirmed by both Agner + uops.info	2022-11-12 18:19:32 +00:00
Simon Pilgrim	cd3cced8aa	[MCA][X86] Add test coverage for VNNI instructions	2022-11-12 17:38:29 +00:00
Simon Pilgrim	0c64d465b4	[MCA][X86] Add missing AVX-GFNI YMM test coverage	2022-11-12 17:37:09 +00:00
Simon Pilgrim	9ec1c83957	[X86] Always classify gf2p8affineqb/gf2p8affineinvqb instructions with SchedWriteVecIMul There was a mismatch between the AVX512 and SSE/AVX versions	2022-11-12 17:20:07 +00:00
Simon Pilgrim	cbe5b2dd91	[MCA][X86] Add test coverage for GFNI instructions	2022-11-12 17:14:06 +00:00
Simon Pilgrim	fca63649ce	[X86] Replace unnecessary SKL CVTSI2SS/CVTSI2SD overrides with better base class defs The folded patterns were missing entirely - confirmed by both Agner + uops.info	2022-11-12 14:29:45 +00:00
Simon Pilgrim	2be46b33d3	[MCA][X86][AVX512] Add test coverage for unsigned<->fp conversion instructions	2022-11-12 13:45:16 +00:00
Simon Pilgrim	07c8f3dd0f	[X86] SkylakeServerModel - conversion instructions don't use Port015 Fixes a lot of throughput mismatches - the more complicated conversion instructions use SKXPort5+SKXPort01, not SKXPort5+SKXPort015 (SKXPort015 is mainly used for basic Logic + blend ops) Fixing this should allow us to remove a lot of unnecessary scheduler overrides from SkylakeServerModel Confirmed by both Agner + uops.info	2022-11-12 12:39:59 +00:00
Simon Pilgrim	b31a5d7270	[X86] Replace unnecessary SKL CVTPD2DQ overrides with better base class defs Also fixes some AVX missing folded instructions	2022-11-12 12:15:56 +00:00
Simon Pilgrim	30498cf7c4	[X86] SkylakeClientModel - conversion instructions don't use Port015 Fixes a lot of throughput mismatches - the more complicated conversion instructions use SKLPort5+SKLPort01, not SKLPort5+SKLPort015 (SKLPort015 is mainly used for basic Logic + blend ops) Fixing this should allow us to remove a lot of unnecessary scheduler overrides from SkylakeClientModel Confirmed by both Agner + uops.info	2022-11-10 12:42:51 +00:00
Simon Pilgrim	810b8fdff9	[X86] Replace unnecessary CVTPS2PI/CVTPS2DQ overrides with better base class defs Broadwell/Haswell were completely overriding the WriteCvtPD2I class defs - we can remove those overrides entirely by just choosing better class defs. Also fixes the scheduler for a missing YMM folded case - confirmed with Agner + uops.info that the port usage is correct	2022-11-09 17:08:45 +00:00
Simon Pilgrim	471f2cff8d	[X86] CVTTSS2SI64rm has the same scheduler def as (V)CVTSS2SI64rm None of Haswell/Broadwell/Skylake/Icelake treat CVTTSS2SI64rm differently from CVTSS2SI64rm (or the AVX variants) Confirmed with Agner, uops.info and Intel AoM	2022-11-08 14:35:39 +00:00
Simon Pilgrim	5c0cb75787	[X86] Folded MOVDDUPrm has the same sched behaviour as MOVSHDUPrm/MOVSLDUPrm on Haswell/IceLake There can be a difference for MOVDDUPrr but not the load folded broadcast that is purely on Port23 Fixes an old TODO (inherited from SkylakeServer which was fixed at `c7662dc3e5`) Confirmed on Agner + uops.info	2022-11-07 15:17:32 +00:00
Simon Pilgrim	4e56aa252f	[X86] Schedule scalar movsx/movzx load+extend ops as WriteLoad instead of WriteALULd Although some very old x86 hardware would perform the extension as a later stage, every target we have a scheduler for always performs this as part of the load-op (avoid ALU pipes etc.). If anyone wants to model very old hardware they can always override this. This patch just tags these as WriteLoad directly and removes unnecessary overrides - this cleans up some latency/throughput tests as they aren't being badly modelled as folded ALU ops	2022-11-06 14:32:05 +00:00
Simon Pilgrim	08fe55b346	[X86] Fix scalar load latencies for WriteLoad scheduler class Znver1/Znver2 were using vector load latency values (which is what WriteFLoad/WriteVecLoad are for) instead of the scalar load latency value TBH I'm not sure clflush/clzero/prefetch ops should be tagged as WriteLoad but at least this makes us more consistent	2022-11-06 14:03:59 +00:00
Simon Pilgrim	6fff3babb4	Revert rG244331ae833aaf33503bbd36890e704afb66a237 "[X86] Fix scalar load latencies for WriteLoad scheduler class" Forgot to update tests outside the llvm-mca test folder :-(	2022-11-06 13:16:23 +00:00
Simon Pilgrim	244331ae83	[X86] Fix scalar load latencies for WriteLoad scheduler class Atom was missing a load latency value (so was defaulting to 1cy) Znver1/Znver2 were using vector load latency values (which is what WriteFLoad/WriteVecLoad are for) instead of the scalar load latency value TBH I'm not sure clflush/clzero/prefetch ops should be tagged as WriteLoad but at least this makes us more consistent	2022-11-06 12:22:10 +00:00
Simon Pilgrim	edf885531e	[X86] Replace unnecessary int2float and float2double overrides with better base class defs Broadwell/Haswell were completely overriding the class defs - we can remove those overrides entirely by just choosing better class defs (plus a fix for missing mmx folded load).	2022-11-05 19:07:01 +00:00
Simon Pilgrim	23ba5bc528	[MCA][X86] Add more avx512 cvt instructions test coverage	2022-11-05 17:28:29 +00:00
Simon Pilgrim	2c79186bce	[X86] Cleanup WriteCvtSD2SS/WriteCvtPD2PS overrides The WriteCvtSD2SS/WriteCvtPD2PS* classes were mostly unused as the models were needlessly overriding all instructions - in some cases the folded pattern overrides were entirely missing (but I've confirmed they just have an additional Port23 use) There were a couple of typos (confirmed with Agner/uops.info) - Skylake/Icelake uses Port5+Port01 for XMM/YMM, Skylake uses Port5+Port05 for ZMM but Icelake uses Port5+Port0	2022-11-05 15:47:05 +00:00
Simon Pilgrim	0b7f327800	[X86] Fix cvtss2si64/cvttss2si64 typo in SkylakeClient SS2SI64 conversions use Port0/Port01/Port5 (with/without truncation), but SS2SI32 only uses Port0/Port01 like SD2SI32/SD2SI64	2022-11-05 14:35:41 +00:00
Simon Pilgrim	b781ca4df6	[X86] Fix override for CVTPD2PS/CVTPD2DQ/CVTTPD2DQ AVX variants These were lost when they were converted from instregex to instrs	2022-11-05 13:57:07 +00:00

1 2 3 4 5 ...

752 Commits