Commit Graph

92 Commits

Author SHA1 Message Date
Simon Pilgrim 0c9c92ffc0 [X86][XOP] Tidyup VPHADD/VPHSUB unary horizontal ops default schedule class
Based off Agner and AMD SoG tables, the XOP VPHADD/VPHSUB unary horizontal ops are as fast as basic arithmetic ops, not the slower SSSE3 binary horizontal add/sub ops. This also matches what the bdver2 model already lists.

Noticed while investigating reduction add optimizations.
2022-03-03 12:07:48 +00:00
Craig Topper 56d6ccd4cb [X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs
Most Intel CPU scheduler files lumped the immediate and 1 instructions
together, but uops.info shows they are quite different.

For the most part the by 1 instructions were pretty accurate to the uops.info
data except the latency was 3 instead of 2 as uops.info indicates.

The by immediate instructions need 7 or 8 uops and have higher latency.

It looks like the 8-bit by immediate instructions may need even more
uops, but I just lumped them with the 16/32/64.

Noticed while checking out PR53648. So mostly I cared about the by 1
instructions.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D119217
2022-02-08 09:20:20 -08:00
Simon Pilgrim a0a0eb192e [X86] Use WriteVecMove scheduler classes for VPMOVM2* instructions
These match the port behaviour of reg-reg predicated xmm/ymm/zmm moves

Fixes #34958
2021-12-27 13:21:29 +00:00
Simon Pilgrim 948ae472a6 [MCA][X86] Add AVX512 vector move instruction test coverage 2021-12-27 11:43:39 +00:00
Simon Pilgrim d9655eec05 [MCA][X86] Add AVX512 subvector broadcast instruction test coverage 2021-12-13 18:48:25 +00:00
Simon Pilgrim fe1b5b56c6 [MCA][X86] Add AVX512 movddup/movshdup/movsldup instruction test coverage
As noted on D115547
2021-12-13 18:04:56 +00:00
Simon Pilgrim b04c646711 [MCA][X86] Add AVX512 broadcast instruction test coverage
As noted on D115547
2021-12-13 17:45:16 +00:00
Simon Pilgrim 4c1d248397 [MCA][X86] Fix duplicated cvtsi2ss/cvtsi2sd i32 + i64 folded tests
Specify the integer width to ensure we're testing the correct instruction
2021-12-12 22:48:45 +00:00
Simon Pilgrim fc02ceb12a [X86][AVX512] Use WriteShuffleX for xmm->xmm extensions
The XMM evex cases have the same behaviour as the SSE41 versions, which already uses WriteShuffleX
2021-12-12 15:22:32 +00:00
Simon Pilgrim 7f09aee0f6 [MCA][X86] Add missing VPMOVSX/VPMOVZX from AVX512 tests 2021-12-10 18:12:57 +00:00
Simon Pilgrim 6fae235885 [MCA][X86] Add missing ALIGND/ALIGNQ from AVX512F/AVX512VL tests 2021-12-10 15:59:52 +00:00
Simon Pilgrim b025b062d6 [MCA][X86] Add missing PALIGNR from AVX512BW/AVX512BWVL tests 2021-12-10 15:59:52 +00:00
Simon Pilgrim ebcc92ccda [MCA][X86] Add missing PSLLDQ/PSRLDQ from AVX512BW/AVX512BWVL tests 2021-12-10 15:59:51 +00:00
Simon Pilgrim 550bf36732 [MCA][X86] Add missing PACKSS/PACKUS from AVX512BW/AVX512BWVL tests 2021-12-10 15:59:51 +00:00
Simon Pilgrim 80ce01c6fd [MCA][X86] Add missing PSHUFLW from AVX512BWVL tests 2021-12-10 14:02:37 +00:00
Roman Lebedev 2f364f6f0d
[NFC][X86][MCA] Add forgotten test coverage for AVX512's VPMOVM2[BWDQ] / VPMOV[BWDQ]2M 2021-11-20 13:09:18 +03:00
Serge Pavlov b36d214bed [X86] Add description of FXAM instruction
Previously this instruction could be used only in assembler. This change
makes it available for compiler also. Scheduling information was copied
from FTST instruction, hopefully this can be a satisfactory approximation.

Differential Revision: https://reviews.llvm.org/D104853
2021-06-25 12:26:51 +07:00
Craig Topper 0cbceed27c [TableGen][ARM][X86] Detect combining IntrReadMem and IntrWriteMem.
These properties aren't additive. They are closer to ReadOnly and
WriteOnly. The default is ReadWrite. ReadMem cancels the write property and
WriteMem cancels the read property. Combining them leaves neither.

This patch checks that when we process WriteMem, the Mod flag is
still set. And for ReadMem we check that the Ref flag set still set.

I've updated 2 target intrinsics that were combining these properties.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93571
2020-12-19 14:56:17 -08:00
Craig Topper f47b07315a [X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands
These instructions read their inputs from fixed registers rather
than using a modrm byte. We shouldn't require the user to list them
when parsing assembly. This matches the GNU assembler.

This patch adds InstAliases so we can accept either form. It also
changes the printing code to use the form without registers. This
will change the behavior of llvm-objdump, but should be consistent
with binutils objdump. This also matches what we already do in LLVM for
clzero and monitorx which also used fixed registers.

I need to add and improve tests before this can be commited. The
disassembler tests exist, but weren't checking the fixed register
so they pass before and after this change.

Fixes https://github.com/ClangBuiltLinux/linux/issues/1216

Differential Revision: https://reviews.llvm.org/D93524
2020-12-19 11:01:55 -08:00
Craig Topper 7f3da48885 [X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel. 2020-10-03 18:44:53 -07:00
Craig Topper 31c40f2d6b [X86] Add mayLoad/mayStore flags to some X87 instructions that don't have isel patterns to infer them from.
Should remove part of the differences in D81833 due to some
some of these getting isel patterns.
2020-06-23 23:40:30 -07:00
Craig Topper 465f5648ee [X86] Remove the mayLoad and mayStore flags from vzeroupper/vzeroall.
But leave the hasUnmodelledSideEffects flag.
2020-05-08 12:47:20 -07:00
Craig Topper c636f694c0 [X86] Add more avx512 instrutions to llvm-mca resource tests 2020-02-16 16:59:36 -08:00
Craig Topper c6bdd8e731 [X86] Improve the gather scheduler models for SkylakeClient and SkylakeServer
The load ports need a cycle for each potentially loaded element just like Haswell and Skylake. Unlike Haswell and Broadwell, the number of uops does not scale with the number of elements. Instead the load uops run for multiple cycles.

I've taken the latency number from the uops.info. The port binding for the non-load uops is taken from the original IACA data I have.

Differential Revision: https://reviews.llvm.org/D74000
2020-02-05 13:26:47 -08:00
Simon Pilgrim 8616bd417f [X86] Fix missing load latencies (PR36894)
We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes
2020-02-05 11:53:16 +00:00
Roman Lebedev 31019dfdf5
[NFC][MCA] Re-autogenerate all check lines in all X86 MCA tests
Some whitespace issues have crept in,
and some znver2 check lines were missing..
2020-01-26 22:17:26 +03:00
Ganesh Gopalasubramanian 3408940f73 [X86] AMD Znver2 (Rome) Scheduler enablement
The patch gives out the details of the znver2 scheduler model.
There are few improvements with respect to execution units, latencies and
throughput when compared with znver1.
The tests that were present for znver1 for llvm-mca tool were replicated.
The latencies, execution units, timeline and throughput information are updated for znver2.

Reviewers: craig.topper, Simon Pilgrim

Differential Revision: https://reviews.llvm.org/D66088
2020-01-10 00:44:59 +05:30
Roman Lebedev a5e65c1cf7 [MCA] Show aggregate over Average Wait times for the whole snippet (PR43219)
Summary:
As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219,
i believe it may be somewhat useful to show //some// aggregates
over all the sea of statistics provided.

Example:
```
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage

      [0]    [1]    [2]    [3]
0.     3     1.0    1.0    4.7       vmulps     %xmm0, %xmm1, %xmm2
1.     3     2.7    0.0    2.3       vhaddps    %xmm2, %xmm2, %xmm3
2.     3     6.0    0.0    0.0       vhaddps    %xmm3, %xmm3, %xmm4
       3     3.2    0.3    2.3       <total>
```
I.e. we average the averages.

Reviewers: andreadb, mattd, RKSimon

Reviewed By: andreadb

Subscribers: gbedwell, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68714

llvm-svn: 374361
2019-10-10 14:46:21 +00:00
Simon Pilgrim 6a3dc3e15c [MCA][X86] Add tests for LOCK variants of standard X86 arithmetic ops
D66424 adds the base support for LOCK so we should be able to add special case support for all these cases in future patches

llvm-svn: 369367
2019-08-20 11:13:20 +00:00
Andrea Di Biagio bf989187c3 [X86] Move scheduling tests for CMPXCHG to the corresponding resources-x86_64.s files. NFC
In D66424 it has been requested to move all the new tests added by r369278 into
resources-x86_64.s. That is because only the 8b/16 ops should be tested by
resources-cmpxchg.s. This partially reverts r369278.

llvm-svn: 369288
2019-08-19 18:20:30 +00:00
Andrea Di Biagio ecbaba672e [X86] Added extensive scheduling model tests for all the CMPXCHG variants. NFC
Addresses a review comment in D66424

llvm-svn: 369279
2019-08-19 17:07:26 +00:00
Craig Topper 29688f4da0 [X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.

Fixes PR42899.

llvm-svn: 368123
2019-08-07 05:34:27 +00:00
Clement Courbet 4ef7c2868a [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62896

llvm-svn: 363773
2019-06-19 08:44:31 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Simon Pilgrim 9d99372f73 [llvm-mca][x86] Fix MMX PMOVMSKB test
This is defined as part of SSE1, XMM PMOVMSKB doesn't appear until SSE2

llvm-svn: 359477
2019-04-29 18:24:30 +00:00
Andrea Di Biagio 43003f0fec [MCA] Fix typo in AVX2 gather tests. NFC
llvm-svn: 359397
2019-04-28 10:54:45 +00:00
Craig Topper c2b35ebc1d [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP

Differential Revision: https://reviews.llvm.org/D59468

llvm-svn: 356384
2019-03-18 17:59:59 +00:00
Craig Topper 12509d87f3 [X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.

The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".

The _alt form on the other hand parsed and printed like any other instruction with no specialness.

With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.

I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.

I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.

Differential Revision: https://reviews.llvm.org/D59398

llvm-svn: 356343
2019-03-17 21:21:37 +00:00
Simon Pilgrim 3f37538b86 [llvm-mca][X86] Add ADC/SBB with zero test cases
Some targets have fast-path handling for these patterns that we should model.

llvm-svn: 355498
2019-03-06 12:51:16 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Simon Pilgrim c9d33907ef [llvm-mca][X86] Add some missing DQI tests
Match more of the coverage of test\CodeGen\X86\avx512-schedule.ll as discussed on D57244 

llvm-svn: 352273
2019-01-26 13:00:46 +00:00
Simon Pilgrim d36f7730cd [llvm-mca][X86] Add missing shuffle tests
Match the coverage of test\CodeGen\X86\avx512-shuffle-schedule.ll so we can get rid of -print-schedule (and fix PR37160) without losing schedule tests

llvm-svn: 352179
2019-01-25 09:17:30 +00:00
Simon Pilgrim 922b540643 [llvm-mca][X86] Tidyup avx512 placeholder tests
Ensure we keep avx512f/bw/dq + vl versions separate, add example broadcast tests - this should allow us to better the test coverage of test\CodeGen\X86\avx512-schedule.ll

llvm-svn: 351848
2019-01-22 17:52:15 +00:00
Simon Pilgrim 8e11254132 [llvm-mca][X86] Add VPOPCNTDQ tests
Matches test coverage of test\CodeGen\X86\avx512vpopcntdq-schedule.ll

llvm-svn: 351842
2019-01-22 17:19:44 +00:00
Simon Pilgrim 90fa50d928 [llvm-mca][X86] Add missing CLWB/CLZERO/FSGSBASE/LWP/MWAITX/RDPID/SHA tests
We're getting pretty close to matching/exceeding test coverage of the test\CodeGen\X86\*-schedule.ll files, which should allow us to get rid of -print-schedule and fix PR37160

llvm-svn: 351836
2019-01-22 16:39:28 +00:00
Simon Pilgrim fc4b1e841e [llvm-mca][X86] Add missing enter/leave, invlpg/invlpga, rdmsr/wrmsr, rdpmc and rdtsc/rdtscp tests
llvm-svn: 351835
2019-01-22 16:29:26 +00:00
Simon Pilgrim 4e03b2496d [llvm-mca][X86] Add missing mfence/pinsrw tests
llvm-svn: 351831
2019-01-22 16:01:08 +00:00