Commit Graph

16957 Commits

Author SHA1 Message Date
Simon Pilgrim f7f84a0ca3 [X86][SandyBridge] Strip unnecessary MOVQ/CVT instruction instrw overrides.
llvm-svn: 330505
2018-04-21 14:03:40 +00:00
Simon Pilgrim 02fc375a22 [X86] Strip unnecessary MMX instruction instrw overrides from scheduler models.
llvm-svn: 330503
2018-04-21 12:15:42 +00:00
Simon Pilgrim c0f654f18e [X86] Strip unnecessary x87 instruction instrw overrides from scheduler models.
llvm-svn: 330501
2018-04-21 11:25:02 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Craig Topper 173d59b62e [X86][SandyBridge] Remove duplciate InstRWs from Sandy Brige scheduler model.
llvm-svn: 330465
2018-04-20 18:55:40 +00:00
Gabor Buella 31fa8025ba [X86] WaitPKG instructions
Three new instructions:

umonitor - Sets up a linear address range to be
monitored by hardware and activates the monitor.
The address range should be a writeback memory
caching type.

umwait - A hint that allows the processor to
stop instruction execution and enter an
implementation-dependent optimized state
until occurrence of a class of events.

tpause - Directs the processor to enter an
implementation-dependent optimized state
until the TSC reaches the value in EDX:EAX.

Also modifying the description of the mfence
instruction, as the rep prefix (0xF3) was allowed
before, which would conflict with umonitor during
disassembly.

Before:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
mfence

After:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
umonitor        %rax

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45253

llvm-svn: 330462
2018-04-20 18:42:47 +00:00
Simon Pilgrim df8fa6d734 [X86][BtVer2] Cleanup some old FIXMEs from the model. NFCI.
llvm-svn: 330428
2018-04-20 13:12:04 +00:00
Simon Pilgrim 2f522ef13d [X86] Tag CLDEMOTE instruction with WriteLoad scheduling class
Same as other cacheline instructions

llvm-svn: 330424
2018-04-20 12:54:53 +00:00
Craig Topper bc895a3afc [X86] Enable popcnt false dependency breaking on Silvermont and Goldmont.
Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus.

llvm-svn: 330358
2018-04-19 19:25:24 +00:00
Simon Pilgrim 4ba057dbd1 [X86][SLM] Fix typo using SandyBridge resources.
Luckily this was on instructions not supported on Silvermont....

llvm-svn: 330351
2018-04-19 18:01:52 +00:00
Craig Topper b5f2659130 [X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.

XADD is probably 2 moves and an add also using a temporary register.

Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.

llvm-svn: 330349
2018-04-19 18:00:17 +00:00
Simon Pilgrim 5e492d29a3 [X86] Merge some MMX instregex
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.

llvm-svn: 330347
2018-04-19 17:32:10 +00:00
Simon Pilgrim f21ace6cdd [X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides.
These are already handled identically by WriteALU.

llvm-svn: 330332
2018-04-19 14:38:36 +00:00
Alexander Ivchenko e8fed1546e Lowering x86 adds/addus/subs/subus intrinsics (llvm part)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44785

llvm-svn: 330322
2018-04-19 12:13:30 +00:00
Simon Pilgrim 3c06617f0e [X86][FMA] Remove FMA reg-reg InstRW scheduler overrides.
These are all already handled identically by WriteFMA.

llvm-svn: 330319
2018-04-19 11:37:26 +00:00
Simon Pilgrim 33dede9075 [X86][BtVer2] Remove 128-bit F16C InstRW overrides.
These are already handled identically by WriteCvtF2F.

llvm-svn: 330318
2018-04-19 11:16:33 +00:00
Craig Topper f846e2d1b1 [X86] Scrub scheduling information for MUL/IMUL on Intel CPUs.
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.

llvm-svn: 330308
2018-04-19 05:34:05 +00:00
Bob Haarman cb80a3fce0 Fix data race in X86FloatingPoint.cpp ASSERT_SORTED
Summary:
ASSERT_SORTED checks if a table is sorted, and uses a boolean to
prevent the check from being run again if it was earlier determined
that the table is in fact sorted. Unsynchronized reads and writes of
that boolean triggered ThreadSanitizer's data race detection. This
change rewrites the code to use std::atomic<bool> instead.

Fixes PR36922.

Reviewers: rnk

Reviewed By: rnk

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D45742

llvm-svn: 330301
2018-04-18 23:04:09 +00:00
Craig Topper ebf52e80c1 [X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and XADD instructions.
I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP.

llvm-svn: 330298
2018-04-18 22:07:53 +00:00
Craig Topper 04244cbf45 [X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the CMPXCHG instructions.
The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler.

llvm-svn: 330287
2018-04-18 20:15:00 +00:00
Chandler Carruth ccd3ecb95a [x86] Switch EFLAGS copy lowering to use reg-reg form of testing for
a zero register.

Previously I tried this and saw LLVM unable to transform this to fold
with memory operands such as spill slot rematerialization. However, it
clearly works as shown in this patch. We turn these into `cmpb $0,
<mem>` when useful for folding a memory operand without issue. This form
has no disadvantage compared to `testb $-1, <mem>`. So overall, this is
likely no worse and may be slightly smaller in some cases due to the
`testb %reg, %reg` form.

Differential Revision: https://reviews.llvm.org/D45475

llvm-svn: 330269
2018-04-18 15:52:50 +00:00
Chandler Carruth 1f87618f8f [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses
across basic blocks in the limited cases where it is very straight
forward to do so.

This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow managing
arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these
seem at all promising so if those cases come up, we'll almost certainly
need to rewrite the parts of LLVM that produce those patterns.

We do now require dominator trees in order to reliably diagnose patterns
that would require PHI nodes. This is a bit unfortunate but it seems
better than the completely mysterious crash we would get otherwise.

Differential Revision: https://reviews.llvm.org/D45673

llvm-svn: 330264
2018-04-18 15:13:16 +00:00
Craig Topper dfccafe18a [X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs.
llvm-svn: 330241
2018-04-18 06:41:25 +00:00
Craig Topper 513e11bb70 [X86] Give CMOV 2 cycle latency on SLM.
llvm-svn: 330239
2018-04-18 06:04:30 +00:00
Craig Topper 8704612481 [X86] Don't crash on bad operand modifiers in inline assembly
Summary: Previously if a modifer was placed on a non-GPR register class we would hit an assert or crash.

Reviewers: echristo

Reviewed By: echristo

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D45751

llvm-svn: 330238
2018-04-18 05:15:24 +00:00
Keith Wyss 3d86823f3d [XRay] Typed event logging intrinsic
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.

Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.

Reviewers: dberris, pelikan, eizan, rSerge, timshen

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45633

llvm-svn: 330219
2018-04-17 21:30:29 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Craig Topper 655e1db722 [X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs.
llvm-svn: 330203
2018-04-17 19:35:14 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Gabor Buella 8f1646b579 [X86] Introduce archs: goldmont-plus & tremont
Using Goldmont's cost tables for these two upcoming
atom archs.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45612

llvm-svn: 330109
2018-04-16 07:47:35 +00:00
Craig Topper 53f9558903 [X86] Use uint32_t instead of unsigned in GetLo32XForm for readability. NFC
GetLo8XForm right next to it uses uint8_t so uint32_t is consistent.

llvm-svn: 330104
2018-04-15 19:11:24 +00:00
Simon Pilgrim b8adf558f8 [X86][MMX] Set PAVG/PHADD/PMIN/PMAX/PSIGN instructions to use same scheduler classes as SSE/AVX
llvm-svn: 330085
2018-04-14 13:06:38 +00:00
Hiroshi Inoue ae17900997 [NFC] fix trivial typos in document and comments
"not not" -> "not" etc

llvm-svn: 330083
2018-04-14 08:59:00 +00:00
Craig Topper 95f421cfbf [X86] Add the bizarro movsww and movzww mnemonics for the disassembler.
The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed.

This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended.

llvm-svn: 330078
2018-04-13 23:57:54 +00:00
Tim Northover 271d3d2771 MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Simon Pilgrim 0e74e50401 [X86] Remove remaining itinerary support from instructions and target (PR37093)
llvm-svn: 330035
2018-04-13 15:37:56 +00:00
Simon Pilgrim a3a9d81231 [X86] Generalize X86FixupLEAs to work with TargetSchedModel
Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call.

llvm-svn: 330030
2018-04-13 15:09:39 +00:00
Simon Pilgrim 01637c473f Remove comment reference to itineraries. NFCI.
llvm-svn: 330025
2018-04-13 14:42:48 +00:00
Simon Pilgrim fe3d59e98b [X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd
llvm-svn: 330023
2018-04-13 14:41:05 +00:00
Simon Pilgrim 21e89795cc [X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093)
llvm-svn: 330022
2018-04-13 14:36:59 +00:00
Simon Pilgrim e0c7868ded Remove comment references to itineraries. NFCI.
llvm-svn: 330021
2018-04-13 14:31:57 +00:00
Simon Pilgrim 963bf4de2b Remove out of data comment. NFCI.
llvm-svn: 330019
2018-04-13 14:24:06 +00:00
Simon Pilgrim ae0c2711b6 [X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093)
llvm-svn: 330013
2018-04-13 12:50:31 +00:00
Hiroshi Inoue 372ffa15cb [NFC] fix trivial typos in comments
"the the" -> "the", "we we" -> "we", etc

llvm-svn: 330006
2018-04-13 11:37:06 +00:00
Gabor Buella 604be4424b [X86] Introduce cldemote instruction
Hint to hardware to move the cache line containing the
address to a more distant level of the cache without
writing back to memory.

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45256

llvm-svn: 329992
2018-04-13 07:35:08 +00:00
Craig Topper 254ed028a4 [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.

We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.

llvm-svn: 329990
2018-04-13 06:07:18 +00:00
Simon Pilgrim 1f070c334c [X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers.
Was being used to move around empty/unused itineraries...

llvm-svn: 329970
2018-04-12 22:57:34 +00:00
Simon Pilgrim 6551d405dc [X86] Remove x86 InstrItinClass entries (PR37093)
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.

llvm-svn: 329967
2018-04-12 22:44:47 +00:00
Simon Pilgrim 0e45634f4e [X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093)
llvm-svn: 329953
2018-04-12 20:47:34 +00:00
Simon Pilgrim e9376b9fdc [X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093)
llvm-svn: 329945
2018-04-12 19:59:35 +00:00
Simon Pilgrim 577ae24feb [X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093)
llvm-svn: 329940
2018-04-12 19:25:07 +00:00
Simon Pilgrim 35935c0632 [X86] Remove remaining gpr schedule itineraries (PR37093)
llvm-svn: 329938
2018-04-12 18:46:15 +00:00
Gabor Buella 297c138798 [X86] Introduce LLVM wbinvd intrinsic
A previously missing intrinsic for an old instruction.

Reviewers: craig.topper, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45312

llvm-svn: 329936
2018-04-12 18:38:18 +00:00
Simon Pilgrim dec781c141 [X86] Remove gpr shift/extension schedule itineraries (PR37093)
llvm-svn: 329933
2018-04-12 18:25:38 +00:00
Simon Pilgrim 8904a86f65 [X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093)
llvm-svn: 329912
2018-04-12 14:31:42 +00:00
Simon Pilgrim 294556d40e [X86] Remove remaining system/special schedule itineraries (PR37093)
llvm-svn: 329906
2018-04-12 12:43:49 +00:00
Simon Pilgrim 0cd0fbd8c5 [X86] Remove system/control schedule itineraries (PR37093)
llvm-svn: 329903
2018-04-12 12:09:24 +00:00
Simon Pilgrim 69e0e8e3d4 [X86] Remove CMOV/SETCC schedule itineraries (PR37093)
llvm-svn: 329898
2018-04-12 11:01:40 +00:00
Simon Pilgrim 10e3bdaaa8 [X86] Remove MMX/3DNow schedule itineraries (PR37093)
llvm-svn: 329896
2018-04-12 10:49:57 +00:00
Simon Pilgrim 32d368147f [X86] Remove X87 schedule itineraries (PR37093)
First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper

llvm-svn: 329893
2018-04-12 10:27:37 +00:00
Simon Pilgrim 7b88d09e75 [X86] Remove unused itinerary argument from FMA3/FMA4/XOP instructions. NFCI.
llvm-svn: 329862
2018-04-11 23:24:38 +00:00
Gabor Buella 2ef36f3571 [X86] Describe wbnoinvd instruction
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.

Reviewers: craig.topper, zvi, ashlykov

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D43816

llvm-svn: 329847
2018-04-11 20:01:57 +00:00
Simon Pilgrim 8fc2b49620 [X86][Atom] Convert Atom scheduler model to SchedRW (PR32431)
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.

I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,

There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.

There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.

NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.

Differential Revision: https://reviews.llvm.org/D45486

llvm-svn: 329837
2018-04-11 18:23:01 +00:00
Simon Pilgrim 7f321d8c24 [X86] Generalize X86PadShortFunction to work with TargetSchedModel
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.

Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.

Differential Revision: https://reviews.llvm.org/D45486

llvm-svn: 329834
2018-04-11 18:05:17 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Craig Topper 9507fa358c [X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select.
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.

llvm-svn: 329774
2018-04-11 04:55:04 +00:00
Craig Topper ee2c1dea4d [X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction.
Previously the code only knew how to handle setcc to a register.

This should fix a crash in the chromium build.

llvm-svn: 329771
2018-04-11 01:09:10 +00:00
Sriraman Tallam d693093a65 GOTPCREL references must always use RIP.
With -fno-plt, global value references can use GOTPCREL and RIP must be used.

Differential Revision: https://reviews.llvm.org/D45460

llvm-svn: 329765
2018-04-10 22:50:05 +00:00
Gabor Buella 213edc4a15 [X86] Split up -march=icelake to -client & -server
Reviewers: craig.topper, zvi, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45055

llvm-svn: 329742
2018-04-10 18:59:13 +00:00
Craig Topper 442428540a [X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly.
This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158

The test case will be added in clang.

llvm-svn: 329734
2018-04-10 18:21:04 +00:00
Simon Pilgrim 95f941117c Fix whitespace indentation. NFCI.
llvm-svn: 329704
2018-04-10 14:21:33 +00:00
Gabor Buella 3eab22d896 [X86] Disable SGX for Skylake Server
Reviewers: craig.topper, zvi, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45057

llvm-svn: 329700
2018-04-10 13:58:57 +00:00
Andrea Di Biagio 486358c153 [X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources.
The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell
resource, and not a Broadwell processor resource. That entry was added to the
Broadwell model because variable blends were consuming it.

This was clearly a typo (the resource name should have been BWPort5), which
unfortunately was never caught before. It was not reported as an error because
HWPort5 is a resource defined by the Haswell model. It has been found when
testing some code with llvm-mca: the list of resources in the resource pressure
view was odd.

This patch fixes the issue; now variable blend instructions consume 2 cycles on
BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious)
entry in the BroadWellModelProcResources table.

llvm-svn: 329686
2018-04-10 10:49:41 +00:00
Clement Courbet b449379eae [MC][TableGen] Add optional libpfm counter names for ProcResUnits.
Summary:
Subtargets can define the libpfm counter names that can be used to
measure cycles and uops issued on ProcResUnits.
This allows making llvm-exegesis available on more targets.
Fixes PR36984.

Reviewers: gchatelet, RKSimon, andreadb, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45360

llvm-svn: 329675
2018-04-10 08:16:37 +00:00
Chandler Carruth 0ca3bd0729 [x86] Model the direction flag (DF) separately from the rest of EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.

In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.

I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.

Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.

Differential Revision: https://reviews.llvm.org/D45154

llvm-svn: 329673
2018-04-10 06:40:51 +00:00
Craig Topper 7e42af87a6 [X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits.
Prefer to use the 32-bit AND with immediate instead.

Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.

Fixes PR37063.

llvm-svn: 329667
2018-04-10 03:44:15 +00:00
Chandler Carruth 19618fc639 [x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.

However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.

There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.

This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.

Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.

Differential Revision: https://reviews.llvm.org/D45146

llvm-svn: 329657
2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich 0cdc6ec535 ShadowCallStack/x86_64: Ignore pseudo-machine instructions
llvm-svn: 329656
2018-04-10 01:31:01 +00:00
Craig Topper 47b2f9d836 [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW.
LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type.

Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round.

Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases.

This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that.

llvm-svn: 329616
2018-04-09 20:37:14 +00:00
Craig Topper 0c2a12cb3e [X86] Revert the SLM part of r328914.
While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list.

llvm-svn: 329593
2018-04-09 17:07:40 +00:00
Simon Pilgrim e5ed5e2cba [X86][MMX] Fix missing itinerary for PALIGNR
llvm-svn: 329568
2018-04-09 13:52:33 +00:00
Simon Pilgrim 140fee078f [X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format
llvm-svn: 329567
2018-04-09 13:42:14 +00:00
Simon Pilgrim abf3611332 [X86][MMX] Fix missing itinerary for CVTPI2PS
llvm-svn: 329565
2018-04-09 13:27:47 +00:00
Simon Pilgrim 0047efdd1e [X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS
The RR/RM itineraries were the wrong way around

llvm-svn: 329561
2018-04-09 13:02:07 +00:00
Simon Pilgrim 6131286553 [X86][SSE] Fix f32 mul/div itinerary groups typo
The RM folded itineraries were incorrectly using the f64 version.

llvm-svn: 329556
2018-04-09 10:45:53 +00:00
Sanjay Patel 0d7df36c66 [TargetSchedule] shrink interface for init(); NFCI
The TargetSchedModel is always initialized using the TargetSubtargetInfo's 
MCSchedModel and TargetInstrInfo, so we don't need to extract those and 
pass 3 parameters to init().

Differential Revision: https://reviews.llvm.org/D44789

llvm-svn: 329540
2018-04-08 19:56:04 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Craig Topper c362f42b6a [X86][Znver1] Remove InstRWs for BLENDVPS/PD
Summary:
This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data.

The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx"

Reviewers: RKSimon, GGanesh

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44841

llvm-svn: 329538
2018-04-08 17:53:15 +00:00
Mandeep Singh Grang 68a151a13c [X86] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: chandlerc, craig.topper, RKSimon

Reviewed By: chandlerc, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44874

llvm-svn: 329534
2018-04-08 16:42:52 +00:00
Simon Pilgrim 86588fc809 [X86][Btver2] Add vector extract costs
llvm-svn: 329524
2018-04-08 11:26:26 +00:00
Craig Topper ef37aebc96 [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering.
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget

llvm-svn: 329510
2018-04-07 19:09:52 +00:00
Simon Pilgrim 80ce1dde44 [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets
llvm-svn: 329498
2018-04-07 13:24:33 +00:00
Craig Topper c50570fb4f [X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB.
llvm-svn: 329424
2018-04-06 17:12:18 +00:00
Craig Topper b9d298ecf2 [X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge scheduler model.
We can get this right through WriteALU and friends now.

llvm-svn: 329417
2018-04-06 16:29:31 +00:00
Craig Topper f0d042619b [X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.

Apparently we were inconsistent about whether the store has latency or not thus the test changes.

I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.

Reviewers: RKSimon, andreadb

Reviewed By: andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45351

llvm-svn: 329416
2018-04-06 16:16:48 +00:00
Craig Topper f131b60049 [X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model.
Even those the address was calculated for the load, its calculated again for the store.

llvm-svn: 329415
2018-04-06 16:16:46 +00:00
Craig Topper 22d25a08ae [X86] Merge itineraries for CLC, CMC, and STC.
These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation.

llvm-svn: 329414
2018-04-06 16:16:43 +00:00
Simon Pilgrim 09eeb3a8b9 [X86][SandyBridge] Add (V)DPPS memory fold latencies
Noticed this during D44654

llvm-svn: 329389
2018-04-06 11:25:21 +00:00
Simon Pilgrim 8a83f16ccd [X86][SandyBridge] SBWriteResPair +5cy Memory Folds
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.

I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.

As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...

Differential Revision: https://reviews.llvm.org/D44654

llvm-svn: 329388
2018-04-06 11:00:51 +00:00
Simon Pilgrim fd1f4fe54e [X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI.
llvm-svn: 329386
2018-04-06 10:16:36 +00:00