Commit Graph

64 Commits

Author SHA1 Message Date
Simon Pilgrim 6a3dc3e15c [MCA][X86] Add tests for LOCK variants of standard X86 arithmetic ops
D66424 adds the base support for LOCK so we should be able to add special case support for all these cases in future patches

llvm-svn: 369367
2019-08-20 11:13:20 +00:00
Andrea Di Biagio bf989187c3 [X86] Move scheduling tests for CMPXCHG to the corresponding resources-x86_64.s files. NFC
In D66424 it has been requested to move all the new tests added by r369278 into
resources-x86_64.s. That is because only the 8b/16 ops should be tested by
resources-cmpxchg.s. This partially reverts r369278.

llvm-svn: 369288
2019-08-19 18:20:30 +00:00
Andrea Di Biagio ecbaba672e [X86] Added extensive scheduling model tests for all the CMPXCHG variants. NFC
Addresses a review comment in D66424

llvm-svn: 369279
2019-08-19 17:07:26 +00:00
Craig Topper 29688f4da0 [X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.

Fixes PR42899.

llvm-svn: 368123
2019-08-07 05:34:27 +00:00
Clement Courbet 4ef7c2868a [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62896

llvm-svn: 363773
2019-06-19 08:44:31 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Simon Pilgrim 9d99372f73 [llvm-mca][x86] Fix MMX PMOVMSKB test
This is defined as part of SSE1, XMM PMOVMSKB doesn't appear until SSE2

llvm-svn: 359477
2019-04-29 18:24:30 +00:00
Andrea Di Biagio 43003f0fec [MCA] Fix typo in AVX2 gather tests. NFC
llvm-svn: 359397
2019-04-28 10:54:45 +00:00
Craig Topper c2b35ebc1d [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP

Differential Revision: https://reviews.llvm.org/D59468

llvm-svn: 356384
2019-03-18 17:59:59 +00:00
Craig Topper 12509d87f3 [X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of custom printing and custom parsing to achieve the same result and more
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.

The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".

The _alt form on the other hand parsed and printed like any other instruction with no specialness.

With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.

I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.

I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.

Differential Revision: https://reviews.llvm.org/D59398

llvm-svn: 356343
2019-03-17 21:21:37 +00:00
Simon Pilgrim 3f37538b86 [llvm-mca][X86] Add ADC/SBB with zero test cases
Some targets have fast-path handling for these patterns that we should model.

llvm-svn: 355498
2019-03-06 12:51:16 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Simon Pilgrim c9d33907ef [llvm-mca][X86] Add some missing DQI tests
Match more of the coverage of test\CodeGen\X86\avx512-schedule.ll as discussed on D57244 

llvm-svn: 352273
2019-01-26 13:00:46 +00:00
Simon Pilgrim d36f7730cd [llvm-mca][X86] Add missing shuffle tests
Match the coverage of test\CodeGen\X86\avx512-shuffle-schedule.ll so we can get rid of -print-schedule (and fix PR37160) without losing schedule tests

llvm-svn: 352179
2019-01-25 09:17:30 +00:00
Simon Pilgrim 922b540643 [llvm-mca][X86] Tidyup avx512 placeholder tests
Ensure we keep avx512f/bw/dq + vl versions separate, add example broadcast tests - this should allow us to better the test coverage of test\CodeGen\X86\avx512-schedule.ll

llvm-svn: 351848
2019-01-22 17:52:15 +00:00
Simon Pilgrim 8e11254132 [llvm-mca][X86] Add VPOPCNTDQ tests
Matches test coverage of test\CodeGen\X86\avx512vpopcntdq-schedule.ll

llvm-svn: 351842
2019-01-22 17:19:44 +00:00
Simon Pilgrim 90fa50d928 [llvm-mca][X86] Add missing CLWB/CLZERO/FSGSBASE/LWP/MWAITX/RDPID/SHA tests
We're getting pretty close to matching/exceeding test coverage of the test\CodeGen\X86\*-schedule.ll files, which should allow us to get rid of -print-schedule and fix PR37160

llvm-svn: 351836
2019-01-22 16:39:28 +00:00
Simon Pilgrim fc4b1e841e [llvm-mca][X86] Add missing enter/leave, invlpg/invlpga, rdmsr/wrmsr, rdpmc and rdtsc/rdtscp tests
llvm-svn: 351835
2019-01-22 16:29:26 +00:00
Simon Pilgrim 4e03b2496d [llvm-mca][X86] Add missing mfence/pinsrw tests
llvm-svn: 351831
2019-01-22 16:01:08 +00:00
Simon Pilgrim 05198a9b8a [llvm-mca][X86] Add missing monitor/mwait tests
These technically should be under a MONITOR cpuid bit, but we tag them as SSE3 so I've done that here as well.

llvm-svn: 351829
2019-01-22 15:48:16 +00:00
Simon Pilgrim 9b3a2f96a1 [llvm-mca][X86] Add missing vperm2i128 tests
llvm-svn: 351828
2019-01-22 14:54:24 +00:00
Simon Pilgrim 1d8d6c3bfb [llvm-mca][X86] Add missing tzcntw tests
llvm-svn: 351827
2019-01-22 14:53:52 +00:00
Simon Pilgrim c4e2776f3b [llvm-mca][x86] Add RDRAND/RDSEED instruction resource tests
llvm-svn: 348622
2018-12-07 18:29:47 +00:00
Clement Courbet e6b727e552 [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.
Summary:
Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources.
After HSW, it also has zero latency.

This fixes PR35606.

To reproduce:
Uops:
  llvm-exegesis -mode=uops -opcode-name=VZEROUPPER
Latency:
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=-

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D54107

llvm-svn: 346482
2018-11-09 09:49:06 +00:00
Clement Courbet a933fb237e [X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB.
Summary:
    While looking at PR35606, I found out that the scheduling info is incorrect.

    One can check that it's really a P5+P6 and not a 2*P56 with:
    echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' | ./bin/llvm-exegesis -mode=uops -snippets-file=-
    (vandps executes on P5 only)

    Reviewers: craig.topper, RKSimon

    Subscribers: llvm-commits

    Differential Revision: https://reviews.llvm.org/D52541

llvm-svn: 343447
2018-10-01 08:37:48 +00:00
Simon Pilgrim b1108399bd [LLVM-MCA][X86] Add missing VCMPESTR/VCMPESTR tests
llvm-svn: 343421
2018-09-30 18:19:00 +00:00
Simon Pilgrim 20623f2343 [LLVM-MCA][X86] Add some AVX512 tests
These are going to be necessary to check I don't mess up when I start cleaning up all the remaining vector integer overrides

llvm-svn: 343414
2018-09-30 17:01:59 +00:00
Simon Pilgrim b56be79e0c Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides
As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead.

The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342969
2018-09-25 13:01:26 +00:00
Simon Pilgrim 0b4ad7596f [X86] Remove shift/rotate by CL memory (RMW) overrides
The uops are slightly different to the register variant, so requires a +1uop tweak

llvm-svn: 342916
2018-09-24 20:11:50 +00:00
Simon Pilgrim 00865a48d1 [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)
Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.

This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.

llvm-svn: 342892
2018-09-24 15:21:57 +00:00
Simon Pilgrim 19952add7c [X86] Added missing RCL/RCR schedule overrides to the generic SNB model
The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.

I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.

This is necessary to allow WriteRotate to be updated to remove other rotate overrides.

It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.

llvm-svn: 342842
2018-09-23 17:40:24 +00:00
Andrew V. Tischenko 62f7a3207b [X86] Improved sched model for X86 CMPXCHG* instructions.
Differential Revision: https://reviews.llvm.org/D50070 

llvm-svn: 341024
2018-08-30 06:26:00 +00:00
Andrea Di Biagio a2eee47450 [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.

Example:
```
Iterations:        100
Instructions:      300
Total Cycles:      414
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    1.69
IPC:               0.72
Block RThroughput: 4.0
```

This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.

llvm-svn: 340946
2018-08-29 17:56:39 +00:00
Andrew V. Tischenko 1fe3375620 [X86] MCA tests for XCHG*, XADD* and CMPXCHG* instructions
Differential Revision: https://reviews.llvm.org/D49912

llvm-svn: 339145
2018-08-07 14:36:43 +00:00
Simon Pilgrim b911d6721d [llvm-mca][x86] Add CMPXCHG instruction resource tests
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).

llvm-svn: 338595
2018-08-01 17:25:11 +00:00
Simon Pilgrim 5c4fb14e07 [llvm-mca][x86] Add PREFETCHW instruction resource tests
These aren't just available via 3DNow! so test for them separately as well.

llvm-svn: 338584
2018-08-01 16:34:39 +00:00
Simon Pilgrim dcfa732b2f [llvm-mca][x86] Add PCLMUL instruction resource tests
Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions

llvm-svn: 338583
2018-08-01 16:25:50 +00:00
Simon Pilgrim 34ac6533f4 [llvm-mca][x86] Add SET/TEST instruction resource tests
llvm-svn: 338576
2018-08-01 15:29:47 +00:00
Simon Pilgrim e364e57ac9 [llvm-mca][x86] Add LEA instruction resource tests
We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet).

llvm-svn: 338565
2018-08-01 14:25:33 +00:00
Simon Pilgrim 6754913e95 [llvm-mca][x86] Add more x86-64 system instruction resource tests
CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT

llvm-svn: 338563
2018-08-01 14:18:09 +00:00
Simon Pilgrim 5f41ab79c0 [llvm-mca][x86] Add CLFLUSHOPT instruction resource tests
llvm-svn: 338550
2018-08-01 13:34:17 +00:00
Simon Pilgrim bd014f4d91 [llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource tests
llvm-svn: 338532
2018-08-01 13:14:45 +00:00
Simon Pilgrim 18d025a732 [llvm-mca][x86] Add STC + STD instruction resource tests
llvm-svn: 338514
2018-08-01 11:00:11 +00:00
Simon Pilgrim 1f4b9cb6fe [llvm-mca][x86] Add 32-bit instruction resource tests
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).

llvm-svn: 338404
2018-07-31 17:33:08 +00:00
Simon Pilgrim 5e729dcc03 [llvm-mca][x86] Add movsx/movzx instructions to general x86_64 resource tests
llvm-svn: 337586
2018-07-20 17:43:42 +00:00
Simon Pilgrim 03164dfa5e [llvm-mca][x86] Add extend, carry-flag and CMP instructions to general x86_64 resource tests
llvm-svn: 337306
2018-07-17 17:47:35 +00:00
Simon Pilgrim 92da01fed9 [llvm-mca][x86] Add MOVBE resource tests to all supporting targets
SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane.

llvm-svn: 337305
2018-07-17 17:41:45 +00:00