Commit Graph

17692 Commits

Author SHA1 Message Date
Simon Pilgrim 1c1335a10d [X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2
These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64.

llvm-svn: 342235
2018-09-14 13:31:14 +00:00
Simon Pilgrim 6a47cdbdec [X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructions
llvm-svn: 342234
2018-09-14 13:09:56 +00:00
Nirav Dave 59ad1c8457 [X86] Fix register resizings for inline assembly register operands.
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.

Reviewers: eli.friedman, craig.topper

Subscribers: nickdesaulniers, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D51502

llvm-svn: 342175
2018-09-13 20:33:56 +00:00
Nirav Dave 2060a16dfd [X86] Cleanup pair returns. NFCI.
llvm-svn: 342174
2018-09-13 20:33:27 +00:00
Craig Topper f107123a88 [X86] Type legalize v2i32 div/rem by scalarizing rather than promoting
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.

This patch switches type legalization to do the scalarizing itself using i32.

It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.

Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51325

llvm-svn: 342114
2018-09-13 06:13:37 +00:00
Craig Topper 2262613532 [X86] Remove isel patterns for ADCX instruction
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.

I don't think gcc will generate this instruction either.

Fixes PR38852.

Differential Revision: https://reviews.llvm.org/D51754

llvm-svn: 342059
2018-09-12 15:47:34 +00:00
Craig Topper dc32e91bc6 [X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.

Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.

Reviewers: rnk, efriedma, MatzeB, echristo

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51893

llvm-svn: 342015
2018-09-12 01:57:22 +00:00
Craig Topper 8238580aae [X86] Prefer unpckhpd over movhlps in isel for fake unary cases
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.

This patch removes the hack.

This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.

Differential Revision: https://reviews.llvm.org/D49499

llvm-svn: 341973
2018-09-11 17:57:27 +00:00
Craig Topper cc9efaffad [X86] Teach X86FastISel::X86SelectRet to use EAX for the sret pointer in GNUX32
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.

Fixes ones of the failures from PR38865.

Differential Revision: https://reviews.llvm.org/D51940

llvm-svn: 341972
2018-09-11 17:57:23 +00:00
Craig Topper d7362a3e5f [X86] Correct the one use check from r341915.
The one use check should be on the bitcast, not the input to the bitcast.

llvm-svn: 341956
2018-09-11 16:05:03 +00:00
Craig Topper 844f035e1e [X86] In combineMOVMSK, look through int->fp bitcasts before callling SimplifyDemandedBits.
MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops.

llvm-svn: 341915
2018-09-11 08:20:02 +00:00
Erich Keane 911ddd6db5 Move FeatureAES from SLM, WSM and SNB to GLM and SKL
Complements https://reviews.llvm.org/D51510 and matches
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01940.html

GoldmontProc already has FeatureAES.

Patch By: thiagomacieira

Differential Revision: https://reviews.llvm.org/D51565

llvm-svn: 341861
2018-09-10 21:12:19 +00:00
Craig Topper a5ae613c15 [X86] Mark the ISD::SETLT/SETLE condition codes as illegal for v32i16/v64i8 to match the other vector types.
I'm having a hard time finding a test case for this, but we should be consistent here. The fact that we canonicalize all zeros and all ones constants to vXi32 and all other constants to loads makes this hard to hit the easy DAG combine infinite loop we get for some of the other types.

llvm-svn: 341859
2018-09-10 20:31:27 +00:00
Craig Topper 3823516103 [X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization
We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.

If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.

llvm-svn: 341765
2018-09-09 20:36:36 +00:00
Craig Topper 7af5e333e7 [X86] Create paddus/psubus from narrower vectors with i8/i16 element types.
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.

Another step towards fixing PR38691

Reviewers: RKSimon, spatel

Reviewed By: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51818

llvm-svn: 341753
2018-09-08 19:32:58 +00:00
Craig Topper a2c9694bc8 [X86] Mark the ADCX and ADOX instruction as commutable.
llvm-svn: 341752
2018-09-08 18:47:56 +00:00
Craig Topper c96305970d [X86] Add commuted isel pattern for the load form of ADCX instructions.
This prevents the legacy ADC instruction from being favored over ADCX when the load is in the operand 0.

llvm-svn: 341745
2018-09-08 06:31:43 +00:00
Craig Topper 5cbce81c91 [X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors.
The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that.

llvm-svn: 341705
2018-09-07 20:56:03 +00:00
Craig Topper 39f48fdcbc [X86] Don't create X86ISD::AVG nodes from v1iX vectors.
The type legalizer will try to scalarize this and fail.

It looks like there's some other v1iX oddities out there too since we still generated some vector instructions.

llvm-svn: 341704
2018-09-07 20:56:01 +00:00
Craig Topper 4863313b35 [X86] Modify the the rdtscp intrinsic to return values instead of taking a pointer argument
Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded.

Differential Revision: https://reviews.llvm.org/D51803

llvm-svn: 341698
2018-09-07 19:14:15 +00:00
Craig Topper 72964ae99e [X86] Change the addcarry and subborrow intrinsics to return 2 results and remove the pointer argument.
We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address.

Differential Revision: https://reviews.llvm.org/D51769

llvm-svn: 341677
2018-09-07 16:58:39 +00:00
Craig Topper 313d09af51 [X86] Teach X86DAGToDAGISel::foldLoadStoreIntoMemOperand to handle loads in operand 1 of commutable operations.
Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations.

Differential Revision: https://reviews.llvm.org/D51768

llvm-svn: 341675
2018-09-07 16:27:55 +00:00
Craig Topper 13148564d4 [X86] Fix some incorrect comments. NFC
llvm-svn: 341624
2018-09-07 01:29:42 +00:00
Craig Topper 2c9dede9cb [X86] Add RMW ADC patterns with load in operand 1.
ADC is commutable and the load could be in either operand, but we were only checking operand 0.

Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it.

llvm-svn: 341606
2018-09-06 23:55:36 +00:00
Craig Topper 0fd5cdee3a [X86] Add isel patterns for commuting X86adc_flag with a load in the LHS.
The peephole pass likely gets this normally, but we should be doing it during isel.

Ideally we'd just make the X86adc_flag pattern SDNPCommutable, but the tablegen doesn't handle that when one of the operands is a register reference.

llvm-svn: 341596
2018-09-06 22:41:44 +00:00
Craig Topper 5a53760f65 [X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives.
This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode.

Fixes PR38826.

Patch by Iain Sandoe.

llvm-svn: 341512
2018-09-06 02:03:14 +00:00
Sander de Smalen c91b27d9ee Remove FrameAccess struct from hasLoadFromStackSlot
This removes the FrameAccess struct that was added to the interface
in D51537, since the PseudoValue from the MachineMemoryOperand
can be safely casted to a FixedStackPseudoSourceValue.

Reviewers: MatzeB, thegameg, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51617

llvm-svn: 341454
2018-09-05 08:59:50 +00:00
Martin Storsjo 68df812cce [MinGW] Move code for indicating "potentially not DSO local" into shouldAssumeDSOLocal. NFC.
On Windows, if shouldAssumeDSOLocal returns false, it's either a
dllimport reference, or a reference that we should treat as non-local
and create a stub for.

Clean up AArch64Subtarget::ClassifyGlobalReference a little while
touching the flag handling relating to dllimport.

Differential Revision: https://reviews.llvm.org/D51590

llvm-svn: 341402
2018-09-04 20:56:28 +00:00
Chandler Carruth 664aa868f5 [x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative
Load Hardening.

Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.

Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.

While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.

This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.

Differential Revision: https://reviews.llvm.org/D51157

llvm-svn: 341363
2018-09-04 12:38:00 +00:00
Chandler Carruth 219888d1b2 [x86/SLH] Teach SLH to harden against the "ret2spec" attack by
implementing the proposed mitigation technique described in the original
design document.

The idea is to check after calls that the return address used to arrive
at that location is in fact the correct address. In the event of
a mis-predicted return which reaches a *valid* return but not the
*correct* return, this will detect the mismatch much like it would
a mispredicted conditional branch.

This is the last published attack vector that I am aware of in the
Spectre v1 space which is not mitigated by SLH+retpolines. However,
don't read *too* much into that: this is an area of ongoing research
where we expect more issues to be discovered in the future, and it also
makes no attempt to mitigate Spectre v4. Still, this is an important
completeness bar for SLH.

The change here is of course delightfully simple. It was predicated on
cutting support for post-instruction symbols into LLVM which was not at
all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner
who helped me figure out how to do a bunch of the complex changes
involved there.

Differential Revision: https://reviews.llvm.org/D50837

llvm-svn: 341358
2018-09-04 10:59:10 +00:00
Chandler Carruth 8d8489f513 [x86/SLH] Teach SLH to harden indirect branches and switches without
retpolines.

This implements the core design of tracing the intended target into the
target, checking it, and using that to update the predicate state. It
takes advantage of a few interesting aspects of SLH to make it a bit
easier to implement:
- We already split critical edges with conditional branches, so we can
assume those are gone.
- We already unfolded any memory access in the indirect branch
instruction itself.

I've left hard errors in place to catch if any of these somewhat subtle
invariants get violated.

There is some code that I can factor out and share with D50837 when it
lands, but I didn't want to couple landing the two patches, so I'll do
that in a follow-up cleanup commit if alright.

Factoring out the code to handle different scenarios of materializing an
address remains frustratingly hard. In a bunch of cases you want to fold
one of the cases into an immediate operand of some other instruction,
and you also have both symbols and basic blocks being used which require
different methods on the MI builder (and different operand kinds).
Still, I'll take a stab at sharing at least some of this code in
a follow-up if I can figure out how.

Differential Revision: https://reviews.llvm.org/D51083

llvm-svn: 341356
2018-09-04 10:44:21 +00:00
Andrea Di Biagio fb3d9e1449 [X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s.
A ReadAdvance was incorrectly added to the SchedReadWrite list associated with
the following SSE instructions:

sqrtss
sqrtsd
rsqrtss
rcpss

As a consequence, a wrong operand latency was computed for the register operand
used as the base address of the folded load operand.

This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases.
There is still a problem with correctly modeling partial register writes on XMM
registers This other problem is currently tracked here:
https://bugs.llvm.org/show_bug.cgi?id=38813

Differential Revision: https://reviews.llvm.org/D51542

llvm-svn: 341326
2018-09-03 16:47:34 +00:00
Argyrios Kyrtzidis c30340b207 Add header guards to some headers that are missing them
Also adjust some of dsymutil's headers to put the header guards at the top,
otherwise the compiler will not recognize them as header guards.

llvm-svn: 341323
2018-09-03 16:22:05 +00:00
Sander de Smalen 6cab60fa06 Extend hasStoreToStackSlot with list of FI accesses.
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.

This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D51537

llvm-svn: 341301
2018-09-03 09:15:58 +00:00
Craig Topper caf6672779 [X86] Add intrinsics for KTEST instructions.
These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors.

New clang builtins will be accompanying them shortly.

llvm-svn: 341259
2018-08-31 21:31:53 +00:00
Craig Topper b7bb9f0078 [X86] Add support for turning vXi1 shuffles into KSHIFTL/KSHIFTR.
This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is.

This will be used to enable kshift intrinsics in clang.

Differential Revision: https://reviews.llvm.org/D51401

llvm-svn: 341227
2018-08-31 17:17:21 +00:00
Andrea Di Biagio a59ec4efa0 [X86][BtVer2] Remove wrong ReadAdvance from AVX vbroadcast(ss|sd|f128) instructions.
The presence of a ReadAdvance for input operand #0 is problematic
because it changes the input latency of the register used as the base address
for the folded load.

A broadcast cannot start executing if the load address hasn't been computed yet.

In the llvm-mca example, the VBROADCASTSS is dependent on the address generated
by the LEAQ.  That means, it cannot start until LEAQ reaches the write-back
stage. If we apply ReadAdvance, then we wrongly assume that the load can start 3
cycles in advance.

Differential Revision: https://reviews.llvm.org/D51534

llvm-svn: 341222
2018-08-31 16:05:48 +00:00
Alexander Ivchenko 9d053074a1 [GlobalISel][X86] Add the support for G_FPTRUNC
Differential Revision: https://reviews.llvm.org/D49855

llvm-svn: 341202
2018-08-31 11:26:51 +00:00
Alexander Ivchenko 9b0b492653 [GlobalISel][X86_64] Support for G_FPTOSI
Differential Revision: https://reviews.llvm.org/D49183

llvm-svn: 341200
2018-08-31 11:16:58 +00:00
Alexander Ivchenko 58a5d6fde7 [GlobalIsel][X86] Support for llvm.trap intrinsic
Differential Revision: https://reviews.llvm.org/D49180

llvm-svn: 341199
2018-08-31 11:05:13 +00:00
Alexander Ivchenko 5b8418983c [NFC] Fix unused variable warning in X86RegisterBankInfo.cpp
llvm-svn: 341198
2018-08-31 10:39:54 +00:00
Alexander Ivchenko a26a364e75 [GlobalIsel][X86] Support for G_FCMP
Differential Revision: https://reviews.llvm.org/D49172

llvm-svn: 341193
2018-08-31 09:38:27 +00:00
Andrea Di Biagio b998eae2f2 [X86][BtVer2] Fix WriteFShuffle256 schedule write info.
This patch fixes the number of micro opcodes, and processor resource cycles for
the following AVX instructions:

vinsertf128rr/rm
vperm2f128rr/rm
vbroadcastf128

Tests have been regenerated using the usual scripts in the llvm/utils directory.

Differential Revision: https://reviews.llvm.org/D51492

llvm-svn: 341185
2018-08-31 08:30:47 +00:00
Martin Storsjo f010872b5c [MinGW] [X86] Pass true for the second parameter to StubValueTy for MO_COFFSTUB. NFC.
These stubs should never be emitted for internal symbols, and
nothing in AsmPrinter ever actually use this value when producing
the stubs for COFF anyway.

llvm-svn: 341177
2018-08-31 08:00:31 +00:00
Craig Topper 83e9f928ba [X86] Don't do anything in ReplaceNodeResults for (v2i32 (fptoui/fptosi v2f32)) when -x86-experimental-vector-widening-legalization is on.
We don't need to do our own widening, the generic legalizer can do it.

llvm-svn: 341174
2018-08-31 07:05:39 +00:00
Craig Topper e9af89a78b [X86] Don't custom widen (v2i32 (setcc v2f32)) when -x86-experimental-vector-widening-legalization is in effect.
We aren't doing anything than what the generic legalizer will do so just let it do it.

llvm-svn: 341172
2018-08-31 07:05:37 +00:00
Craig Topper 1a8c99e670 [X86] Weaken an overly aggressive assert.
This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded.

Fixes PR38771

llvm-svn: 341102
2018-08-30 19:35:38 +00:00
Alexander Ivchenko af96112ec6 Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions
..Move all target-dependent checks into new isCopyInstrImpl method.

This change allows us to treat MoveReg-type instructions and generic
COPY instruction in the same way

Differential Revision: https://reviews.llvm.org/D49913

llvm-svn: 341072
2018-08-30 14:32:47 +00:00
Andrew V. Tischenko 62f7a3207b [X86] Improved sched model for X86 CMPXCHG* instructions.
Differential Revision: https://reviews.llvm.org/D50070 

llvm-svn: 341024
2018-08-30 06:26:00 +00:00
Craig Topper b7b353be60 [X86] Make Feature64Bit useful
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.

I've changed the assert to a report_fatal_error so it's not lost in Release builds.

The test updates are to fix things that tripped the new error.

Differential Revision: https://reviews.llvm.org/D51231

llvm-svn: 341022
2018-08-30 06:01:05 +00:00