Commit Graph

12972 Commits

Author SHA1 Message Date
David L Kreitzer 99775c1b6e Fixed a few typos and formatting problems. NFCI.
llvm-svn: 266135
2016-04-12 21:45:09 +00:00
Justin Bogner 32ad24d4ef X86: Avoid accessing SDValues after they've been RAUW'd
This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress
matches an ADD with an AND as an operand, and that AND hits one of the
"heroic transforms" that folds masks and shifts, we end up with N
pointing to an SDNode that was deleted. Make sure we're done accessing
it before that.

Found by ASan with the recycling allocator changes in llvm.org/PR26808.

llvm-svn: 266130
2016-04-12 21:34:24 +00:00
Sanjay Patel e6a0a23e08 fix indentation; NFC
llvm-svn: 266097
2016-04-12 18:01:48 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Sriraman Tallam f39e190ad8 Test commit.
llvm-svn: 265976
2016-04-11 18:40:50 +00:00
Andrey Turetskiy 9df334c28e [X86] Restrict max long nop length for Lakemont.
Restrict the max length of long nops for Lakemont to 7. Experiments on MCU
benchmarks (Dhrystone, Coremark) show that this is the most optimal length.

Differential Revision: http://reviews.llvm.org/D18897

llvm-svn: 265924
2016-04-11 10:07:36 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Craig Topper 35db8ecb50 [X86] Use for loops over types to reduce code for setting up operation actions.
llvm-svn: 265893
2016-04-10 05:39:32 +00:00
Craig Topper dcc8f49bf0 [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC
llvm-svn: 265892
2016-04-10 05:39:28 +00:00
Davide Italiano 7aa47094b2 [MC] support TLSDESC and TLSCALL / GNU2 tls dialect
Differential Revision:  http://reviews.llvm.org/D18885

llvm-svn: 265881
2016-04-09 20:32:33 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim 1cc5712763 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Craig Topper f027107094 [X86] Use for loops over types to reduce code for setting up operation actions. NFC
llvm-svn: 265871
2016-04-09 06:31:02 +00:00
Craig Topper e801ed9e15 [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC
llvm-svn: 265870
2016-04-09 05:53:48 +00:00
Tim Shen 0012756489 [SSP] Remove llvm.stackprotectorcheck.
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).

llvm-svn: 265851
2016-04-08 21:26:31 +00:00
Hans Wennborg e25b65bdb7 Rangeify a loop. NFC.
llvm-svn: 265846
2016-04-08 20:46:09 +00:00
Hans Wennborg 74ff770670 Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC
These are already defined, with the same values, a few lines up. NFC.

llvm-svn: 265845
2016-04-08 20:46:00 +00:00
Kevin B. Smith e0a6fc3bcc [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Simon Pilgrim 476170384f [X86] Tidied up shuffle decode function doxygen descriptions
As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions.

llvm-svn: 265785
2016-04-08 14:17:07 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Hans Wennborg ab16be799c Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Hans Wennborg a7e396b5ef Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.

llvm-svn: 265551
2016-04-06 16:10:20 +00:00
Evgeniy Stepanov dde29e2799 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Manman Ren f8bdd88cd9 Swift Calling Convention: add swiftcc.
Differential Revision: http://reviews.llvm.org/D17863

llvm-svn: 265480
2016-04-05 22:41:47 +00:00
Duncan P. N. Exon Smith 91d3cfed78 Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes."
This reverts commit r265454 since it broke the build.  E.g.:

  http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/

llvm-svn: 265459
2016-04-05 20:45:04 +00:00
Eugene Zelenko 1760dc2a23 Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes.
Some Include What You Use suggestions were used too.

Use anonymous namespaces in source files.

Differential revision: http://reviews.llvm.org/D18778

llvm-svn: 265454
2016-04-05 20:19:49 +00:00
Ahmed Bougacha 50e6cd4a3a [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha 629446ba03 [X86] Simplify early-exit check. NFC.
llvm-svn: 265447
2016-04-05 20:02:22 +00:00
Sanjay Patel 4c7d094451 fix typo; NFC
llvm-svn: 265442
2016-04-05 19:27:39 +00:00
Hans Wennborg a47a692341 Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:

        addl    $12, %esp
        leal    -4(%ebp), %esp

To be "optimized" into simply:

        addl    $8, %esp

This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.

llvm-svn: 265345
2016-04-04 21:02:46 +00:00
Matthias Braun 870c34f0cf ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
Derek Schuff 1dbf7a571f Add MachineFunctionProperty checks for AllVRegsAllocated for target passes
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.

Reviewers: qcolombet

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18525

llvm-svn: 265313
2016-04-04 17:09:25 +00:00
Elena Demikhovsky e99c561391 AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim 0edd3d771a [X86] Removed duplicate code.
llvm-svn: 265274
2016-04-03 20:40:35 +00:00
Simon Pilgrim cd0dfc93eb [X86][SSE] Support for MOVMSK signbit extraction instructions
Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR.

This is an early step towards more optimal handling of vector comparison results.

Differential Revision: http://reviews.llvm.org/D18741

llvm-svn: 265266
2016-04-03 18:22:03 +00:00
Simon Pilgrim 20d1d4f045 [X86] Tidied up X86ISD instruction nodes. NFCI.
Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related.

No change in ordering although there is definitely some scope for it.

llvm-svn: 265263
2016-04-03 14:14:32 +00:00
Elena Demikhovsky 5e426f7356 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Sanjay Patel 9f413364d5 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Sanjay Patel a05e0ff223 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Andrea Di Biagio 8c48841907 [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.
Since revision 235394, we no longer perform target specific combines on
build_vector nodes. No functional change intended.

llvm-svn: 265138
2016-04-01 12:25:44 +00:00
Andrey Turetskiy 958eb46443 [X86] Introduce Lakemont CPU.
Add a new Intel MCU CPU Lakemont, which doesn't support X87.

Differential Revision: http://reviews.llvm.org/D18650

llvm-svn: 265128
2016-04-01 10:16:15 +00:00
Michael Kuperstein 7bab713188 Use range-based for loops. NFC.
llvm-svn: 265105
2016-04-01 03:45:08 +00:00
Hans Wennborg 649159df3c Follow-up to r265036: I got these iterators mixed up
llvm-svn: 265076
2016-03-31 23:55:16 +00:00
Hans Wennborg 132cd62121 Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio

llvm-svn: 265046
2016-03-31 20:27:30 +00:00
Hans Wennborg e97fb414e8 [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)
For code such as:

  void f(int, int);
  void g() {
      f(1, 2);
  }

compiled for 32-bit X86 Linux, Clang would previously generate:

  subl    $12, %esp
  subl    $8, %esp
  pushl   $2
  pushl   $1
  calll   f
  addl    $16, %esp
  addl    $12, %esp
  retl

This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265039
2016-03-31 19:26:24 +00:00
Hans Wennborg e1a2e90ffa Change eliminateCallFramePseudoInstr() to return an iterator
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.

It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.

Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265036
2016-03-31 18:33:38 +00:00