Commit Graph

3791 Commits

Author SHA1 Message Date
Sanjay Patel e6a0a23e08 fix indentation; NFC
llvm-svn: 266097
2016-04-12 18:01:48 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Craig Topper 35db8ecb50 [X86] Use for loops over types to reduce code for setting up operation actions.
llvm-svn: 265893
2016-04-10 05:39:32 +00:00
Craig Topper dcc8f49bf0 [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC
llvm-svn: 265892
2016-04-10 05:39:28 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Craig Topper f027107094 [X86] Use for loops over types to reduce code for setting up operation actions. NFC
llvm-svn: 265871
2016-04-09 06:31:02 +00:00
Craig Topper e801ed9e15 [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC
llvm-svn: 265870
2016-04-09 05:53:48 +00:00
Tim Shen 0012756489 [SSP] Remove llvm.stackprotectorcheck.
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).

llvm-svn: 265851
2016-04-08 21:26:31 +00:00
Hans Wennborg e25b65bdb7 Rangeify a loop. NFC.
llvm-svn: 265846
2016-04-08 20:46:09 +00:00
Hans Wennborg 74ff770670 Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC
These are already defined, with the same values, a few lines up. NFC.

llvm-svn: 265845
2016-04-08 20:46:00 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Evgeniy Stepanov dde29e2799 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Ahmed Bougacha 50e6cd4a3a [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha 629446ba03 [X86] Simplify early-exit check. NFC.
llvm-svn: 265447
2016-04-05 20:02:22 +00:00
Matthias Braun 870c34f0cf ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
Elena Demikhovsky e99c561391 AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim 0edd3d771a [X86] Removed duplicate code.
llvm-svn: 265274
2016-04-03 20:40:35 +00:00
Simon Pilgrim cd0dfc93eb [X86][SSE] Support for MOVMSK signbit extraction instructions
Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR.

This is an early step towards more optimal handling of vector comparison results.

Differential Revision: http://reviews.llvm.org/D18741

llvm-svn: 265266
2016-04-03 18:22:03 +00:00
Elena Demikhovsky 5e426f7356 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Sanjay Patel 9f413364d5 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Sanjay Patel a05e0ff223 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Andrea Di Biagio 8c48841907 [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.
Since revision 235394, we no longer perform target specific combines on
build_vector nodes. No functional change intended.

llvm-svn: 265138
2016-04-01 12:25:44 +00:00
Sanjay Patel 92d5ea5e07 [x86] use SSE/AVX ops for non-zero memsets (PR27100)
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.

Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143

Differential Revision: http://reviews.llvm.org/D18566

llvm-svn: 265029
2016-03-31 17:30:06 +00:00
Nirav Dave 83ce54aac2 Prevent X86ISelLowering from merging volatile loads
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18546

llvm-svn: 265013
2016-03-31 13:40:55 +00:00
Craig Topper d2aa03a60a [X86] Use MVT instead of EVT in code called after legalization.
llvm-svn: 264992
2016-03-31 04:37:41 +00:00
Matthias Braun 8d41436004 CodeGen: Factor out code for tail call result compatibility check; NFC
llvm-svn: 264959
2016-03-30 22:46:04 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Chandler Carruth 81c3ddeb1c [x86] Extract a helper function to compute the full addressing mode from
an x86 MachineInstr's operands. This will be super useful to fix some
bad atomics code in my next commit.

No functionality changed.

llvm-svn: 264819
2016-03-30 03:10:24 +00:00
Simon Pilgrim d3df400fa9 [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Elena Demikhovsky 83f0647d85 AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Simon Pilgrim dcdf85033c [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim e4dbeb40c6 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim 3eef33a806 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim 7379a70677 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
Simon Pilgrim ff7b7141cd [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.

llvm-svn: 264506
2016-03-26 09:29:04 +00:00
David Majnemer 020e890a19 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Simon Pilgrim ac04923b0f [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC.
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.

llvm-svn: 264403
2016-03-25 14:17:54 +00:00
Elena Demikhovsky abc9c04ab7 fixed typo
llvm-svn: 264395
2016-03-25 10:08:36 +00:00
Elena Demikhovsky 95f3173ce9 AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Sanjay Patel 7876f180b5 [x86] make peekThroughBitcasts() a helper function
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.

It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.

llvm-svn: 264186
2016-03-23 20:16:37 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00