Commit Graph

752 Commits

Author SHA1 Message Date
Simon Pilgrim f51170bffd [X86] Fix SLM ldmxcsr/stmxcsr schedule classes
Fix a long standing FIXME comment using a mixture of llvm-exegesis and Agner numbers
2022-11-28 17:43:17 +00:00
Simon Pilgrim c65d5d4aec [X86] Remove unnecessary (V)?PBLENDW(Y)?rm overrides
The znver1/znver2 overrides shouldn't need 2uops for the xmm case (but znver1 should double-pump for the ymm case).

Found with the help of D138359
2022-11-28 16:32:55 +00:00
Simon Pilgrim 026df9514e [X86] Remove unnecessary VBLENDWYrr overrides
The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping.

Found with the help of D138359
2022-11-27 16:54:47 +00:00
Simon Pilgrim 2285ba9acc [X86] Fix uops counts for SLM extract/extract-store instructions
Matches Intel AoM + Agner
2022-11-27 16:16:36 +00:00
Simon Pilgrim 746cf4f13f [X86] Synchronise scheduler classes of VPERM2F128/VBROADCASTF128/VEXTRACTF128/VINSERTF128 with I128 equivalents
znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides.

Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.
2022-11-21 17:15:47 +00:00
Simon Pilgrim 89365b159e [X86] IceLakeServer - PACKS instructions take latency 3cy
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64

Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect
2022-11-20 19:28:35 +00:00
Simon Pilgrim 7de156d1cc [MCA][X86] Add missing test coverage for BWI instructions 2022-11-20 17:19:58 +00:00
Simon Pilgrim 421bdc119a [MCA][X86] Add test coverage for IFMA instructions 2022-11-20 17:19:58 +00:00
Simon Pilgrim 6a8fabf5c3 [MCA][X86] Add test coverage for XSAVE instructions 2022-11-20 13:56:04 +00:00
Simon Pilgrim 9148aeac00 [X86] Remove unnecessary string instruction overrides from znver1/znver2 models
Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded

It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake
2022-11-20 12:57:44 +00:00
Simon Pilgrim 357f1c4ef1 [X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model
D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).
2022-11-20 12:13:02 +00:00
Simon Pilgrim 420d02bb55 [MCA][X86] Add test coverage for LOOP/LOOPE/LOOPNE instructions
These were missed for some reason - only noticed this while investigating a FIXME in the SandyBridge model

Also sync the znver2/znver3 tests which had been missed when LOCK test coverage was added
2022-11-20 11:35:21 +00:00
Simon Pilgrim 13fd7373b6 [X86] znver2 - (V)EXTRACTPSrr takes 2 uops
D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)
2022-11-20 09:24:55 +00:00
Simon Pilgrim 474e41f1b9 [MCA][X86] Add test coverage for BF16 instructions 2022-11-19 21:46:23 +00:00
Simon Pilgrim ba5714d773 [MCA][X86] Add test coverage for VP2INTERSECT instructions
NOTE: For IceLakeServer we actually test TigerLake as that's the only target that supports it (we do something similar for F16C on IvyBridge in the SandyBridge tests).
2022-11-19 21:46:23 +00:00
Simon Pilgrim 420d0d3aa6 [MCA][X86] Add test coverage for VAES instructions 2022-11-19 21:02:19 +00:00
Simon Pilgrim aae08b1d37 [MCA][X86] Add test coverage for BITALG instructions 2022-11-19 12:04:45 +00:00
Simon Pilgrim 91deae999a [MCA][X86] Add test coverage for VPCLMULQDQ instructions 2022-11-18 21:22:10 +00:00
Simon Pilgrim ffe05b8f57 [MCA][X86] Add missing IceLake test coverage for VPOPCNTDQ instructions 2022-11-18 20:58:29 +00:00
Simon Pilgrim 4c854120c2 [MCA][X86] Add test coverage for AVX512CD instructions 2022-11-18 20:58:29 +00:00
Simon Pilgrim c6a838e9c8 [MCA][X86] Add test coverage for VBMI instructions 2022-11-16 16:58:26 +00:00
Simon Pilgrim 896271dbea [MCA][X86] Ensure the avx512 gfni tests use the upper xmm/ymm registers
Ensure we're testing the avx512vl gfni instructions and not the avx gfni instructions
2022-11-15 11:06:59 +00:00
Simon Pilgrim 7e78685752 [MCA][X86] Ensure the avx512 vnni tests use the upper xmm/ymm registers
Ensure we're testing the avx512vl vnni instructions and not the avx vnni instructions
2022-11-14 16:29:31 +00:00
Simon Pilgrim d7208b0404 [MCA][X86] Add test coverage for VBMI2 instructions 2022-11-14 16:29:31 +00:00
Simon Pilgrim e5120a43d5 [X86] Update WriteMPSAD class and remove VMPSADBWrri override
AMD 15h SoG + Agner both indicate there's no difference between MPSADBWrri + VMPSADBWrri - I can't find any data on the folded variant so I've kept the existing numbers

Removes the last X86 override for WriteMPSAD/WritePSADBW classes - removing a further 3 entries from every sched class table
2022-11-13 15:19:37 +00:00
Simon Pilgrim 6a99f23845 [MCA][X86] Add test coverage for VDBPSADBW instructions 2022-11-13 15:19:36 +00:00
Simon Pilgrim 313a4aef7f [X86] Fix scheduler tag for GFNI YMM instructions
These were hardcoded to XMM width
2022-11-13 14:10:09 +00:00
Simon Pilgrim e19cb9c57f [X86] Cleanup CVTPD2PS schedule values
The znver1/znver2 schedules for CVTPD2PS were incorrectly double pumping the xmm-load variant instead of the ymm variants (znver1 only)

Also, the xmm-load variant was incorrectly using FP03 instead of just FP3

Confirmed by the AMD SoG 17h tables, Agner + uops.info

Another step towards removing a lot of unnecessary overrides from all the x86 scheduler models - these should hopefully be convertible into regular WriteCvtPD2I classes soon.
2022-11-13 11:13:30 +00:00
Simon Pilgrim 4a28b7ba98 [X86] IceLakeModel - conversion instructions don't use Port015
Fixes a lot of throughput mismatches - the more complicated conversion instructions use ICXPort5+ICXPort01, not ICXPort5+ICXPort015 (ICXPort015 is mainly used for basic Logic + blend ops)

Fixing this should allow us to remove a lot of unnecessary scheduler overrides from IceLakeModel

Confirmed by both Agner + uops.info
2022-11-12 18:19:32 +00:00
Simon Pilgrim cd3cced8aa [MCA][X86] Add test coverage for VNNI instructions 2022-11-12 17:38:29 +00:00
Simon Pilgrim 0c64d465b4 [MCA][X86] Add missing AVX-GFNI YMM test coverage 2022-11-12 17:37:09 +00:00
Simon Pilgrim 9ec1c83957 [X86] Always classify gf2p8affineqb/gf2p8affineinvqb instructions with SchedWriteVecIMul
There was a mismatch between the AVX512 and SSE/AVX versions
2022-11-12 17:20:07 +00:00
Simon Pilgrim cbe5b2dd91 [MCA][X86] Add test coverage for GFNI instructions 2022-11-12 17:14:06 +00:00
Simon Pilgrim fca63649ce [X86] Replace unnecessary SKL CVTSI2SS/CVTSI2SD overrides with better base class defs
The folded patterns were missing entirely - confirmed by both Agner + uops.info
2022-11-12 14:29:45 +00:00
Simon Pilgrim 2be46b33d3 [MCA][X86][AVX512] Add test coverage for unsigned<->fp conversion instructions 2022-11-12 13:45:16 +00:00
Simon Pilgrim 07c8f3dd0f [X86] SkylakeServerModel - conversion instructions don't use Port015
Fixes a lot of throughput mismatches - the more complicated conversion instructions use SKXPort5+SKXPort01, not SKXPort5+SKXPort015 (SKXPort015 is mainly used for basic Logic + blend ops)

Fixing this should allow us to remove a lot of unnecessary scheduler overrides from SkylakeServerModel

Confirmed by both Agner + uops.info
2022-11-12 12:39:59 +00:00
Simon Pilgrim b31a5d7270 [X86] Replace unnecessary SKL CVTPD2DQ overrides with better base class defs
Also fixes some AVX missing folded instructions
2022-11-12 12:15:56 +00:00
Simon Pilgrim 30498cf7c4 [X86] SkylakeClientModel - conversion instructions don't use Port015
Fixes a lot of throughput mismatches - the more complicated conversion instructions use SKLPort5+SKLPort01, not SKLPort5+SKLPort015 (SKLPort015 is mainly used for basic Logic + blend ops)

Fixing this should allow us to remove a lot of unnecessary scheduler overrides from SkylakeClientModel

Confirmed by both Agner + uops.info
2022-11-10 12:42:51 +00:00
Simon Pilgrim 810b8fdff9 [X86] Replace unnecessary CVTPS2PI/CVTPS2DQ overrides with better base class defs
Broadwell/Haswell were completely overriding the WriteCvtPD2I class defs - we can remove those overrides entirely by just choosing better class defs.

Also fixes the scheduler for a missing YMM folded case - confirmed with Agner + uops.info that the port usage is correct
2022-11-09 17:08:45 +00:00
Simon Pilgrim 471f2cff8d [X86] CVTTSS2SI64rm has the same scheduler def as (V)CVTSS2SI64rm
None of Haswell/Broadwell/Skylake/Icelake treat CVTTSS2SI64rm differently from CVTSS2SI64rm (or the AVX variants)

Confirmed with Agner, uops.info and Intel AoM
2022-11-08 14:35:39 +00:00
Simon Pilgrim 5c0cb75787 [X86] Folded MOVDDUPrm has the same sched behaviour as MOVSHDUPrm/MOVSLDUPrm on Haswell/IceLake
There can be a difference for MOVDDUPrr but not the load folded broadcast that is purely on Port23

Fixes an old TODO (inherited from SkylakeServer which was fixed at c7662dc3e5)

Confirmed on Agner + uops.info
2022-11-07 15:17:32 +00:00
Simon Pilgrim 4e56aa252f [X86] Schedule scalar movsx/movzx load+extend ops as WriteLoad instead of WriteALULd
Although some very old x86 hardware would perform the extension as a later stage, every target we have a scheduler for always performs this as part of the load-op (avoid ALU pipes etc.). If anyone wants to model very old hardware they can always override this.

This patch just tags these as WriteLoad directly and removes unnecessary overrides - this cleans up some latency/throughput tests as they aren't being badly modelled as folded ALU ops
2022-11-06 14:32:05 +00:00
Simon Pilgrim 08fe55b346 [X86] Fix scalar load latencies for WriteLoad scheduler class
Znver1/Znver2 were using vector load latency values (which is what WriteFLoad*/WriteVecLoad* are for) instead of the scalar load latency value

TBH I'm not sure clflush/clzero/prefetch ops should be tagged as WriteLoad but at least this makes us more consistent
2022-11-06 14:03:59 +00:00
Simon Pilgrim 6fff3babb4 Revert rG244331ae833aaf33503bbd36890e704afb66a237 "[X86] Fix scalar load latencies for WriteLoad scheduler class"
Forgot to update tests outside the llvm-mca test folder :-(
2022-11-06 13:16:23 +00:00
Simon Pilgrim 244331ae83 [X86] Fix scalar load latencies for WriteLoad scheduler class
Atom was missing a load latency value (so was defaulting to 1cy)

Znver1/Znver2 were using vector load latency values (which is what WriteFLoad*/WriteVecLoad* are for) instead of the scalar load latency value

TBH I'm not sure clflush/clzero/prefetch ops should be tagged as WriteLoad but at least this makes us more consistent
2022-11-06 12:22:10 +00:00
Simon Pilgrim edf885531e [X86] Replace unnecessary int2float and float2double overrides with better base class defs
Broadwell/Haswell were completely overriding the class defs - we can remove those overrides entirely by just choosing better class defs (plus a fix for missing mmx folded load).
2022-11-05 19:07:01 +00:00
Simon Pilgrim 23ba5bc528 [MCA][X86] Add more avx512 cvt instructions test coverage 2022-11-05 17:28:29 +00:00
Simon Pilgrim 2c79186bce [X86] Cleanup WriteCvtSD2SS/WriteCvtPD2PS overrides
The WriteCvtSD2SS/WriteCvtPD2PS* classes were mostly unused as the models were needlessly overriding all instructions - in some cases the folded pattern overrides were entirely missing (but I've confirmed they just have an additional Port23 use)

There were a couple of typos (confirmed with Agner/uops.info) - Skylake/Icelake uses Port5+Port01 for XMM/YMM, Skylake uses Port5+Port05 for ZMM but Icelake uses Port5+Port0
2022-11-05 15:47:05 +00:00
Simon Pilgrim 0b7f327800 [X86] Fix cvtss2si64/cvttss2si64 typo in SkylakeClient
SS2SI64 conversions use Port0/Port01/Port5 (with/without truncation), but SS2SI32 only uses Port0/Port01 like SD2SI32/SD2SI64
2022-11-05 14:35:41 +00:00
Simon Pilgrim b781ca4df6 [X86] Fix override for CVTPD2PS/CVTPD2DQ/CVTTPD2DQ AVX variants
These were lost when they were converted from instregex to instrs
2022-11-05 13:57:07 +00:00