Commit Graph

8821 Commits

Author SHA1 Message Date
Craig Topper 24c3a2395f [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support.
llvm-svn: 291747
2017-01-12 06:49:12 +00:00
Craig Topper 69ab67b279 [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq.
llvm-svn: 291746
2017-01-12 06:49:08 +00:00
Elad Cohen c5ba925ef2 [X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask
r289653 added a case where `vselect <cond> <vector1> <all-zeros>`
is transformed to:
`vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>`
This was not aimed to catch cases where Cond is not a vXi1
mask but it does. Moreover, when Cond type is VxiN (N > 1)
then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond).
This patch changes the above to xor with allones, and avoids
entering the case for non-mask Conds.

llvm-svn: 291745
2017-01-12 06:49:03 +00:00
Craig Topper 56f9610b98 [AVX-512] Add more varied avx512 feature command lines to the avx512-cvt.ll test to show some poor codegen examples.
We're definitely doing bad things when avx512vl is enabled without avx512dq. It looks like avx512vl/dq without avx512bw may also have some issues.

llvm-svn: 291744
2017-01-12 06:49:03 +00:00
Kyle Butt efe56fed12 Revert "CodeGen: Allow small copyable blocks to "break" the CFG."
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.

This needs a simple probability check because there are some cases where it is
not profitable.

llvm-svn: 291695
2017-01-11 19:55:19 +00:00
Simon Pilgrim fd627bf86c [X86][XOP] Add vpermil2ps target shuffle -> insertps combine test
llvm-svn: 291690
2017-01-11 18:48:00 +00:00
Elena Demikhovsky 9d0e7c33d3 X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
And VPACKUS* instructions on SEE* targets.

Differential Revision: https://reviews.llvm.org/D28216

llvm-svn: 291670
2017-01-11 12:59:32 +00:00
Simon Pilgrim 5a81fefad3 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

llvm-svn: 291665
2017-01-11 10:36:51 +00:00
Elad Cohen 0c2601073e [X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}
The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd,
(v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes
redundant (v)movss/(v)movsd instructions. This patch adds patterns for
optimizing these sequences.

Differential revision: https://reviews.llvm.org/D28455

llvm-svn: 291660
2017-01-11 09:11:48 +00:00
Craig Topper 7af39837a9 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

llvm-svn: 291650
2017-01-11 04:59:25 +00:00
Craig Topper 577d258569 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
llvm-svn: 291645
2017-01-11 04:02:23 +00:00
Hans Wennborg 6573976f57 Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
This was reverted because it would miscompile code where the cmp had
multiple uses. That was due to a deficiency in the existing code, which
was fixed in r291630 (see the PR for details).

This re-commit includes an extra test for the kind of code that got
miscompiled: @test_sub_1_setcc_jcc.

llvm-svn: 291640
2017-01-11 01:36:57 +00:00
Hans Wennborg 12de693747 [X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses
We would miscompile the following:

  void g(int);
  int f(volatile long long *p) {
    bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0;
    g(b ? 12 : 34);
    return b ? 56 : 78;
  }

into

  pushq   %rax
  lock            incq    (%rdi)
  movl    $12, %eax
  movl    $34, %edi
  cmovlel %eax, %edi
  callq   g(int)
  testq   %rax, %rax   <---- Bad.
  movl    $56, %ecx
  movl    $78, %eax
  cmovsl  %ecx, %eax
  popq    %rcx
  retq

because the code failed to take into account that the cmp has multiple
uses, replaced one of them, and left the other one comparing garbage.

llvm-svn: 291630
2017-01-11 00:49:54 +00:00
Justin Lebar 7d81813d76 [TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.
Summary:
Previously if you had

 * a function with the fast-math-enabled attr, followed by
 * a function without the fast-math attr,

the second function would inherit the first function's fast-math-ness.

This means that mixing fast-math and non-fast-math functions in a module
was completely broken unless you explicitly annotated every
non-fast-math function with "unsafe-fp-math"="false".  This appears to
have been broken since r176986 (March 2013), when the resetTargetOptions
function was introduced.

This patch tests the correct behavior as best we can.  I don't think I
can test FPDenormalMode and NoTrappingFPMath, because they aren't used
in any backends during function lowering.  Surprisingly, I also can't
find any uses at all of LessPreciseFPMAD affecting generated code.

The NVPTX/fast-math.ll test changes are an expected result of fixing
this bug.  When FMA is disabled, we emit add as "add.rn.f32", which
prevents fma combining.  Before this patch, fast-math was enabled in all
functions following the one which explicitly enabled it on itself, so we
were emitting plain "add.f32" where we should have generated
"add.rn.f32".

Reviewers: mkuper

Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28507

llvm-svn: 291618
2017-01-10 23:43:04 +00:00
Kyle Butt df27aa8c89 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.

Differential revision: https://reviews.llvm.org/D27742

llvm-svn: 291609
2017-01-10 23:04:30 +00:00
Douglas Yung ee787a7665 Make the test accept different OpCode values since it doesn't really care about the value.
Differential Revision: https://reviews.llvm.org/D28487

llvm-svn: 291605
2017-01-10 22:10:22 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Michael Zuckerman bcd03e7f3b [X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions
This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351

1.  This patch adds new type of shuffle lowering
2.  We can use the expand instruction, When the shuffle pattern is as following:
    { 0*a[0]0*a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}.

Reviewers: 1. igorb  
           2. guyblank  
           3. craig.topper  
           4. RKSimon 

Differential Revision: https://reviews.llvm.org/D28352

llvm-svn: 291584
2017-01-10 18:57:17 +00:00
Craig Topper d55b83128b AMD family 17h (znver1) enablement
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.

This item is linked to clang review item https://reviews.llvm.org/D28018

Patch by Ganesh Gopalasubramanian

Reviewers: RKSimon, craig.topper

Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits

Differential Revision: https://reviews.llvm.org/D28017

llvm-svn: 291543
2017-01-10 06:01:16 +00:00
Simon Pilgrim fa32894730 [X86][AVX512VL] Added AVX512VL to 128/256 bit vector shift tests
llvm-svn: 291488
2017-01-09 22:13:51 +00:00
Matthias Braun ba7d95d425 PeepholeOptimizer: Do not replace SubregToReg(bitcast like)
While we can usually replace bitcast like instructions
(MachineInstr::isBitcast()) with a COPY this is not legal if any of the
users uses SUBREG_TO_REG to assert the upper bits of the result are
zero.

Differential Revision: https://reviews.llvm.org/D28474

llvm-svn: 291483
2017-01-09 21:38:17 +00:00
Michael Kuperstein 1559e8863e Revert r291092 because it introduces a crash.
See PR31589 for details.

llvm-svn: 291478
2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov d497d36083 X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.
Differential Revision: https://reviews.llvm.org/D28087

llvm-svn: 291473
2017-01-09 20:26:17 +00:00
Simon Pilgrim 0f23b2ba1a [X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.

Cost model updates to follow.

llvm-svn: 291451
2017-01-09 17:20:03 +00:00
Simon Pilgrim d990cd371b [X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).

Cost model updates to follow.

llvm-svn: 291445
2017-01-09 15:15:45 +00:00
Simon Pilgrim f8538572ab [X86][AVX512DQ] Added AVX512DQ to 128/256 bit vector shift tests
llvm-svn: 291444
2017-01-09 14:36:09 +00:00
Craig Topper 96ab6fd2eb [AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation.
llvm-svn: 291419
2017-01-09 04:19:34 +00:00
Craig Topper 6393afce97 [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.

llvm-svn: 291415
2017-01-09 02:44:34 +00:00
Craig Topper f51ba1e3da [AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX.
llvm-svn: 291402
2017-01-08 21:32:30 +00:00
Craig Topper 0930a523cc [X86] Add avx512bw and avx512dq command lines to the vector compare results test.
This is preparation for improving a case with avx512dq.

llvm-svn: 291401
2017-01-08 21:32:26 +00:00
Sanjay Patel bf51c8a975 [x86] fix usage of stale operands when lowering select
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.

The debug output shows nodes like this:

t4: i32 = xor t2, Constant:i32<-1>
    t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
  t14: i32 = select t21, t4, Constant:i32<-1>

And because the select is holding onto the t4 (xor) node while EmitTest creates a new 
x86-specific xor node, the lowering results in:

  t4: i32 = xor t2, Constant:i32<-1>
  t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1

Differential Revision: https://reviews.llvm.org/D28374

llvm-svn: 291392
2017-01-08 15:53:40 +00:00
Craig Topper a74e3088df [AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions.
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.

llvm-svn: 291371
2017-01-07 22:20:34 +00:00
Craig Topper 2d02d4926b [X86] Regenerate a test to remove tab characters.
llvm-svn: 291370
2017-01-07 22:20:28 +00:00
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim 935beac173 [X86][AVX2] Regenerate arithmetic tests
Fixed missing checks for tests that used a '-' in the name, which was messing with update_llc_test_checks.py

llvm-svn: 291363
2017-01-07 20:38:36 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Simon Pilgrim 08519d7b02 [X86][SSE] Standardized triples in vector shift tests
Made no sense for them to be different and caused useless diffs in assembly remarks.

llvm-svn: 291274
2017-01-06 19:56:57 +00:00
Simon Pilgrim 9122793b15 [X86][AVX] Regenerate shuffle 128-bit tests.
The EVEX -> VEX fix means that AVX/AVX512 code is more likely the same now.

llvm-svn: 291242
2017-01-06 15:56:52 +00:00
Simon Pilgrim 10cc5d555f [X86][AVX] Regenerate tzcnt tests.
The EVEX -> VEX fix means that AVX/AVX512 code is more likely the same now.

llvm-svn: 291241
2017-01-06 15:54:23 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
Craig Topper 8cbac879db [AVX-512] Add more masked vector extract test cases with and without a bitcast between the select.
The ones with the bitcast need additional work to fold the mask operation properly. This will be fixed in a future commit.

llvm-svn: 291213
2017-01-06 05:18:44 +00:00
Matthias Braun 1172332203 CodeGen: Assert that liveness is up to date when reading block live-ins.
Add an assert that checks whether liveins are up to date before they are
used.

- Do not print liveins into .mir files anymore in situations where they
  are out of date anyway.
- The assert in the RegisterScavenger is superseded by the new one in
  livein_begin().
- Skip parts of the liveness updating logic in IfConversion.cpp when
  liveness isn't tracked anymore (just enough to avoid hitting the new
  assert()).

Differential Revision: https://reviews.llvm.org/D27562

llvm-svn: 291169
2017-01-05 20:01:19 +00:00
Sanjay Patel 686527c1e0 [x86] add test to show bug in select lowering; NFC
llvm-svn: 291151
2017-01-05 18:35:44 +00:00
Zvi Rackover b10f7de3b5 [X86] Add test cases that cover pr31551. NFC.
llvm-svn: 291127
2017-01-05 16:48:28 +00:00
Zvi Rackover 4b7d724d62 [X86] Optimize vector shifts with variable but uniform shift amounts
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.

Reviewers: craig.topper, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28353

llvm-svn: 291120
2017-01-05 15:11:43 +00:00
Kristof Beyls a983e7c4a4 [GlobalISel] Add support for address-taken basic blocks
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.

Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D28123

llvm-svn: 291105
2017-01-05 13:27:52 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Craig Topper eea52429cd [AVX-512] Update vextract64x4 intrinsic upgrade test cases to use a legal immediate so they test the instruction selection correctly.
llvm-svn: 291061
2017-01-05 01:34:55 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00