Commit Graph

60 Commits

Author SHA1 Message Date
Clement Courbet 4ef7c2868a [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62896

llvm-svn: 363773
2019-06-19 08:44:31 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Simon Pilgrim 9d99372f73 [llvm-mca][x86] Fix MMX PMOVMSKB test
This is defined as part of SSE1, XMM PMOVMSKB doesn't appear until SSE2

llvm-svn: 359477
2019-04-29 18:24:30 +00:00
Andrea Di Biagio 43003f0fec [MCA] Fix typo in AVX2 gather tests. NFC
llvm-svn: 359397
2019-04-28 10:54:45 +00:00
Craig Topper c2b35ebc1d [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP

Differential Revision: https://reviews.llvm.org/D59468

llvm-svn: 356384
2019-03-18 17:59:59 +00:00
Craig Topper 12509d87f3 [X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.

The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".

The _alt form on the other hand parsed and printed like any other instruction with no specialness.

With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.

I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.

I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.

Differential Revision: https://reviews.llvm.org/D59398

llvm-svn: 356343
2019-03-17 21:21:37 +00:00
Simon Pilgrim 3f37538b86 [llvm-mca][X86] Add ADC/SBB with zero test cases
Some targets have fast-path handling for these patterns that we should model.

llvm-svn: 355498
2019-03-06 12:51:16 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Simon Pilgrim c9d33907ef [llvm-mca][X86] Add some missing DQI tests
Match more of the coverage of test\CodeGen\X86\avx512-schedule.ll as discussed on D57244 

llvm-svn: 352273
2019-01-26 13:00:46 +00:00
Simon Pilgrim d36f7730cd [llvm-mca][X86] Add missing shuffle tests
Match the coverage of test\CodeGen\X86\avx512-shuffle-schedule.ll so we can get rid of -print-schedule (and fix PR37160) without losing schedule tests

llvm-svn: 352179
2019-01-25 09:17:30 +00:00
Simon Pilgrim 922b540643 [llvm-mca][X86] Tidyup avx512 placeholder tests
Ensure we keep avx512f/bw/dq + vl versions separate, add example broadcast tests - this should allow us to better the test coverage of test\CodeGen\X86\avx512-schedule.ll

llvm-svn: 351848
2019-01-22 17:52:15 +00:00
Simon Pilgrim 8e11254132 [llvm-mca][X86] Add VPOPCNTDQ tests
Matches test coverage of test\CodeGen\X86\avx512vpopcntdq-schedule.ll

llvm-svn: 351842
2019-01-22 17:19:44 +00:00
Simon Pilgrim 90fa50d928 [llvm-mca][X86] Add missing CLWB/CLZERO/FSGSBASE/LWP/MWAITX/RDPID/SHA tests
We're getting pretty close to matching/exceeding test coverage of the test\CodeGen\X86\*-schedule.ll files, which should allow us to get rid of -print-schedule and fix PR37160

llvm-svn: 351836
2019-01-22 16:39:28 +00:00
Simon Pilgrim fc4b1e841e [llvm-mca][X86] Add missing enter/leave, invlpg/invlpga, rdmsr/wrmsr, rdpmc and rdtsc/rdtscp tests
llvm-svn: 351835
2019-01-22 16:29:26 +00:00
Simon Pilgrim 4e03b2496d [llvm-mca][X86] Add missing mfence/pinsrw tests
llvm-svn: 351831
2019-01-22 16:01:08 +00:00
Simon Pilgrim 05198a9b8a [llvm-mca][X86] Add missing monitor/mwait tests
These technically should be under a MONITOR cpuid bit, but we tag them as SSE3 so I've done that here as well.

llvm-svn: 351829
2019-01-22 15:48:16 +00:00
Simon Pilgrim 9b3a2f96a1 [llvm-mca][X86] Add missing vperm2i128 tests
llvm-svn: 351828
2019-01-22 14:54:24 +00:00
Simon Pilgrim 1d8d6c3bfb [llvm-mca][X86] Add missing tzcntw tests
llvm-svn: 351827
2019-01-22 14:53:52 +00:00
Simon Pilgrim c4e2776f3b [llvm-mca][x86] Add RDRAND/RDSEED instruction resource tests
llvm-svn: 348622
2018-12-07 18:29:47 +00:00
Clement Courbet e6b727e552 [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.
Summary:
Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources.
After HSW, it also has zero latency.

This fixes PR35606.

To reproduce:
Uops:
  llvm-exegesis -mode=uops -opcode-name=VZEROUPPER
Latency:
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=-

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D54107

llvm-svn: 346482
2018-11-09 09:49:06 +00:00
Clement Courbet a933fb237e [X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB.
Summary:
    While looking at PR35606, I found out that the scheduling info is incorrect.

    One can check that it's really a P5+P6 and not a 2*P56 with:
    echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' | ./bin/llvm-exegesis -mode=uops -snippets-file=-
    (vandps executes on P5 only)

    Reviewers: craig.topper, RKSimon

    Subscribers: llvm-commits

    Differential Revision: https://reviews.llvm.org/D52541

llvm-svn: 343447
2018-10-01 08:37:48 +00:00
Simon Pilgrim b1108399bd [LLVM-MCA][X86] Add missing VCMPESTR/VCMPESTR tests
llvm-svn: 343421
2018-09-30 18:19:00 +00:00
Simon Pilgrim 20623f2343 [LLVM-MCA][X86] Add some AVX512 tests
These are going to be necessary to check I don't mess up when I start cleaning up all the remaining vector integer overrides

llvm-svn: 343414
2018-09-30 17:01:59 +00:00
Simon Pilgrim b56be79e0c Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides
As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead.

The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342969
2018-09-25 13:01:26 +00:00
Simon Pilgrim 0b4ad7596f [X86] Remove shift/rotate by CL memory (RMW) overrides
The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342916
2018-09-24 20:11:50 +00:00
Simon Pilgrim 00865a48d1 [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)
Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.

This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.

llvm-svn: 342892
2018-09-24 15:21:57 +00:00
Simon Pilgrim 19952add7c [X86] Added missing RCL/RCR schedule overrides to the generic SNB model
The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.

I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.

This is necessary to allow WriteRotate to be updated to remove other rotate overrides.

It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.

llvm-svn: 342842
2018-09-23 17:40:24 +00:00
Andrew V. Tischenko 62f7a3207b [X86] Improved sched model for X86 CMPXCHG* instructions.
Differential Revision: https://reviews.llvm.org/D50070 

llvm-svn: 341024
2018-08-30 06:26:00 +00:00
Andrea Di Biagio a2eee47450 [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.

Example:
```
Iterations:        100
Instructions:      300
Total Cycles:      414
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    1.69
IPC:               0.72
Block RThroughput: 4.0
```

This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.

llvm-svn: 340946
2018-08-29 17:56:39 +00:00
Andrew V. Tischenko 1fe3375620 [X86] MCA tests for XCHG*, XADD* and CMPXCHG* instructions
Differential Revision: https://reviews.llvm.org/D49912

llvm-svn: 339145
2018-08-07 14:36:43 +00:00
Simon Pilgrim b911d6721d [llvm-mca][x86] Add CMPXCHG instruction resource tests
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).

llvm-svn: 338595
2018-08-01 17:25:11 +00:00
Simon Pilgrim 5c4fb14e07 [llvm-mca][x86] Add PREFETCHW instruction resource tests
These aren't just available via 3DNow! so test for them separately as well.

llvm-svn: 338584
2018-08-01 16:34:39 +00:00
Simon Pilgrim dcfa732b2f [llvm-mca][x86] Add PCLMUL instruction resource tests
Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions

llvm-svn: 338583
2018-08-01 16:25:50 +00:00
Simon Pilgrim 34ac6533f4 [llvm-mca][x86] Add SET/TEST instruction resource tests
llvm-svn: 338576
2018-08-01 15:29:47 +00:00
Simon Pilgrim e364e57ac9 [llvm-mca][x86] Add LEA instruction resource tests
We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet).

llvm-svn: 338565
2018-08-01 14:25:33 +00:00
Simon Pilgrim 6754913e95 [llvm-mca][x86] Add more x86-64 system instruction resource tests
CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT

llvm-svn: 338563
2018-08-01 14:18:09 +00:00
Simon Pilgrim 5f41ab79c0 [llvm-mca][x86] Add CLFLUSHOPT instruction resource tests
llvm-svn: 338550
2018-08-01 13:34:17 +00:00
Simon Pilgrim bd014f4d91 [llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource tests
llvm-svn: 338532
2018-08-01 13:14:45 +00:00
Simon Pilgrim 18d025a732 [llvm-mca][x86] Add STC + STD instruction resource tests
llvm-svn: 338514
2018-08-01 11:00:11 +00:00
Simon Pilgrim 1f4b9cb6fe [llvm-mca][x86] Add 32-bit instruction resource tests
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).

llvm-svn: 338404
2018-07-31 17:33:08 +00:00
Simon Pilgrim 5e729dcc03 [llvm-mca][x86] Add movsx/movzx instructions to general x86_64 resource tests
llvm-svn: 337586
2018-07-20 17:43:42 +00:00
Simon Pilgrim 03164dfa5e [llvm-mca][x86] Add extend, carry-flag and CMP instructions to general x86_64 resource tests
llvm-svn: 337306
2018-07-17 17:47:35 +00:00
Simon Pilgrim 92da01fed9 [llvm-mca][x86] Add MOVBE resource tests to all supporting targets
SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane.

llvm-svn: 337305
2018-07-17 17:41:45 +00:00
Simon Pilgrim 94049e8b15 [llvm-mca][x86] Add BSWAP resource tests
llvm-svn: 337302
2018-07-17 17:10:47 +00:00
Andrea Di Biagio 483db141e3 [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.

r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.

When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.

As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).

The mca tests show the differences in the instruction info flags.  Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.

Differential Revision: https://reviews.llvm.org/D49182

llvm-svn: 336818
2018-07-11 15:27:50 +00:00
Andrea Di Biagio d2e2c053cf [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC
This makes easier to identify changes in the instruction info flags.  It also
helps spotting potential regressions similar to the one recently introduced at
r336728.

Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.

This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.

In future, each instruction flag should be associated with a different character
in the Instruction Info View.

llvm-svn: 336797
2018-07-11 12:44:44 +00:00
Roman Lebedev 0e58dee284 [MCA][X86][NFC] Add BSF/BSR resource tests
Reviewers: RKSimon, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48997

llvm-svn: 336510
2018-07-08 09:50:14 +00:00