Commit Graph

28659 Commits

Author SHA1 Message Date
Sanjay Patel d5bc5ca3e4 [x86] adjust LEA tests for better coverage; NFC
The scale can 1, 2, or 3.

llvm-svn: 358539
2019-04-16 23:10:41 +00:00
Simon Pilgrim d769bb1e58 [X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.

This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.

Differential Revision: https://reviews.llvm.org/D60562

llvm-svn: 358516
2019-04-16 19:18:53 +00:00
Sanjay Patel f136c46bd6 [x86] add more tests for LEA formation; NFC
Promoting the shift to the wider type should allow LEA.

llvm-svn: 358513
2019-04-16 18:58:03 +00:00
Luis Marques 20d2424016 [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.

Differential Revision: https://reviews.llvm.org/D59477

llvm-svn: 358498
2019-04-16 14:38:32 +00:00
Hans Wennborg 21eb771dcb Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483
2019-04-16 12:13:25 +00:00
Amara Emerson 02a90ea73d [AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.

llvm-svn: 358458
2019-04-15 22:34:08 +00:00
Craig Topper 0495f29e42 [X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a 512 type.
The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a

llvm-svn: 358450
2019-04-15 21:06:32 +00:00
Craig Topper 77439bb128 [X86] Fix a stack folding test to have a full xmm2-31 clobber list instead of stopping at xmm15. Add an additional dependency to keep instruction below inline asm block.
llvm-svn: 358449
2019-04-15 21:06:23 +00:00
Matt Arsenault 101abd219b AMDGPU: Fix unreachable when counting register usage of SGPR96
llvm-svn: 358447
2019-04-15 20:51:12 +00:00
Matt Arsenault fbdd2a1887 AMDGPU: Fix printed format of SReg_96
These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446
2019-04-15 20:42:18 +00:00
Craig Topper 3d9b47c770 [X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without avx512bw.
32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash.

llvm-svn: 358436
2019-04-15 18:39:45 +00:00
Sanjay Patel 8ae68f2648 [x86] update test checks; NFC
llvm-svn: 358432
2019-04-15 17:38:47 +00:00
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Yevgeny Rouban 3992e9d229 Codegen: Fixed perf branch_weights in couple of tests. NFC.
This is need to pass future checks of perf branch_weights metadata.

llvm-svn: 358384
2019-04-15 09:30:31 +00:00
Bjorn Pettersson 60569363a5 [SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.

This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.

Reviewers: spatel, RKSimon, nikic, lebedev.ri

Reviewed By: nikic

Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60460

llvm-svn: 358372
2019-04-15 07:19:11 +00:00
Craig Topper abd87ff48b [X86] Regenerate checks for domain-reassignment.mir
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.

llvm-svn: 358371
2019-04-15 05:22:47 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Craig Topper 3c57976447 [X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG.
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.

This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.

The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.

We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.

There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.

llvm-svn: 358359
2019-04-14 18:26:11 +00:00
Craig Topper b17e5ec61b [X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use.
We're better of emitting a single compare + kand rather than a compare for the
other use and a masked compare.

I'm looking into using custom instruction selection for VPTESTM to reduce the
ridiculous number of permutations of patterns in the isel table. Putting a one
use check on all masked compare folding makes load fold matching in the custom
code easier.

llvm-svn: 358358
2019-04-14 18:26:06 +00:00
Craig Topper 476dd06854 [X86] Update bool_reduction_v8f32 test cases from vector-compare-any_of.ll and vector-compare-all_of.ll to be proper reductions.
One of the shuffles was used twice. While the intended shuffle wasn't connected.

llvm-svn: 358346
2019-04-14 04:20:42 +00:00
Bill Wendling 191f1487b6 [X86] Use PC-relative mode for the kernel code model
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits, kees, nickdesaulniers

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60643

llvm-svn: 358343
2019-04-13 21:39:28 +00:00
Amara Emerson 93e58d2396 [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash.
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.

Differential Revision: https://reviews.llvm.org/D60579

llvm-svn: 358318
2019-04-13 00:33:25 +00:00
Amara Emerson bdb5e4e4ca [GlobalISel] Fix a crash when handling an invalid MVT during call lowering.
This crash was introduced in r358032 as we try to construct an EVT from an MVT
in order to find the register type for the calling conv. Fall back instead of
trying to do this with an invalid MVT coming from i256.

llvm-svn: 358314
2019-04-12 22:05:46 +00:00
Amara Emerson 2806fd01a1 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Thomas Lively 9e27514996 [WebAssembly] Add mutable-globals to bleeding-edge CPU
Summary: This brings the backend in line with Clang.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60594

llvm-svn: 358310
2019-04-12 20:39:53 +00:00
Brendon Cahoon 4df216cd62 [Hexagon] Fix reuse bug in Vector Loop Carried Reuse pass
The Hexagon Vector Loop Carried Reuse pass was allowing reuse between
two shufflevectors with different masks. The reason is that the masks
are not instruction objects, so the code that checks each operand
just skipped over the operands.

This patch fixes the bug by checking if the operands are the same
when they are not instruction objects. If the objects are not the
same, then the code assumes that reuse cannot occur.

Differential Revision: https://reviews.llvm.org/D60019

llvm-svn: 358292
2019-04-12 16:37:12 +00:00
Sanjay Patel 5e4ad39af7 [DAGCombiner] narrow shuffle of concatenated vectors
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291
2019-04-12 16:31:56 +00:00
Simon Pilgrim 6c8f4ada36 [X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns
Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions.

This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result.

This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation.

Differential Revision: https://reviews.llvm.org/D60610

llvm-svn: 358286
2019-04-12 14:22:57 +00:00
Hans Wennborg 4e6b857922 Revert r358268 "[DebugInfo] DW_OP_deref_size in PrologEpilogInserter."
It causes clang to crash while building Chromium. See https://crbug.com/952230
for reproducer.

> The PrologEpilogInserter need to insert a DW_OP_deref_size before
> prepending a memory location expression to an already implicit
> expression to avoid having the existing expression act on the memory
> address instead of the value behind it.
>
> The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
> big-endian targets need to read the right size as simply truncating a
> larger read would yield the wrong result (LSB bytes are not at the lower
> address).
>
> Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358281
2019-04-12 12:54:52 +00:00
Kang Zhang 2446f843ae [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358271
2019-04-12 09:59:40 +00:00
Markus Lavin 138c76129b [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358268
2019-04-12 08:23:55 +00:00
Eric Christopher b6926bdcff Revert "[PowerPC] Add initialization for some ppc passes"
This reverts commit 6f8f98ce8d as it
is breaking nearly every bot.

llvm-svn: 358260
2019-04-12 07:16:58 +00:00
Craig Topper 3b1239d2a8 [TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes.
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.

Differential Revision: https://reviews.llvm.org/D60358

llvm-svn: 358257
2019-04-12 06:49:28 +00:00
Kang Zhang 6f8f98ce8d [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358256
2019-04-12 06:35:15 +00:00
Zi Xuan Wu ac79ef8f0e [PowerPC] More precise exploitation of P9 maddld instruction when operands are constant
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to 
materialize the operand, and one more extra register too. So it's not 
profitable to use maddld to optimize mul-add pattern.

Differential Revision: https://reviews.llvm.org/D60181

llvm-svn: 358253
2019-04-12 05:21:31 +00:00
Brendon Cahoon 57c3d4bed3 [Pipeliner] Fix incorrect loop carried dependence calculation
The isLoopCarriedDep function does not correctly compute loop
carried dependences when the array index offset is negative
or the stride is smallar than the access size.

Patch by Denis Antrushin.

Differential Revision: https://reviews.llvm.org/D60135

llvm-svn: 358233
2019-04-11 21:57:51 +00:00
Amara Emerson 7e9355f870 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson b956051415 [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Aaron Smith 994023a3f1 [DebugInfo] Combine Trivial and NonTrivial flags
Summary:
Companion to https://reviews.llvm.org/D59347


Reviewers: rnk, zturner, probinson, dblaikie, deadalnix

Subscribers: aprantl, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59348

llvm-svn: 358220
2019-04-11 20:25:10 +00:00
Craig Topper 68a5d619a4 [X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1.
If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1.

The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand.

llvm-svn: 358218
2019-04-11 19:57:44 +00:00
Craig Topper a3635b94c4 [X86] Add 32-bit command line to extractelement-fp.ll so I can add a test case for a 32-bit only crasher. NFC
This is a bit ugly for ABI reasons about how floats/doubles are returned.

llvm-svn: 358217
2019-04-11 19:57:24 +00:00
Craig Topper 586fad50ac [X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead.
This patch adds patterns for turning bitcasted atomic load/store into movss/sd.

It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part.

This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled.

Differential Revision: https://reviews.llvm.org/D60394

llvm-svn: 358215
2019-04-11 19:19:52 +00:00
Craig Topper f7e548c076 Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
With correct test checks this time.

If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ

This matches what gcc and icc do for this case and removes an existing FIXME.

llvm-svn: 358214
2019-04-11 19:19:42 +00:00
Craig Topper 8200880c9a Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
I seem to have messed up the test checks.

llvm-svn: 358212
2019-04-11 19:04:38 +00:00
Craig Topper 1c2dfc3100 [X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness.

This matches what gcc and icc do for this case and removes an existing FIXME.

Differential Revision: https://reviews.llvm.org/D60156

llvm-svn: 358211
2019-04-11 18:40:21 +00:00
Craig Topper 1fe5a9963d [X86] Pre-commit i64 volatile test case for D60156. NFC
llvm-svn: 358210
2019-04-11 18:40:08 +00:00
Simon Pilgrim 40b647ae8e [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support
Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 

llvm-svn: 358186
2019-04-11 15:29:15 +00:00
Simon Pilgrim a41275a398 [X86][AVX] Tweak X86ISD::VPERMV3 demandedelts test
Original test was too dependent on the order of the combines that could cause the inserted element being demanded after all

llvm-svn: 358182
2019-04-11 15:09:03 +00:00
Simon Pilgrim 34686b6e97 [X86][AVX] Add X86ISD::VPERMV3 demandedelts test
llvm-svn: 358175
2019-04-11 14:48:46 +00:00
Simon Pilgrim 8a25154fa7 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support
llvm-svn: 358174
2019-04-11 14:35:45 +00:00
Simon Pilgrim b237b54c2d [X86][AVX] Add X86ISD::VPERMV demandedelts test
llvm-svn: 358173
2019-04-11 14:26:32 +00:00
Sanjay Patel c0f4a35e68 [DAGCombiner][x86] scalarize inserted vector FP ops
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...

The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.

Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.

It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.

We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.

Differential Revision: https://reviews.llvm.org/D60514

llvm-svn: 358172
2019-04-11 14:21:57 +00:00
Diogo N. Sampaio 8ddfd46c61 [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Simon Pilgrim 6f3866c6fb [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support
llvm-svn: 358170
2019-04-11 14:15:01 +00:00
Simon Pilgrim 886e32e0f2 [X86][AVX] Add X86ISD::VPERMILPV demandedelts tests
llvm-svn: 358168
2019-04-11 14:09:35 +00:00
Simon Pilgrim cb5218ad48 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support
llvm-svn: 358167
2019-04-11 14:04:19 +00:00
Simon Pilgrim 7021dec26e [X86][XOP] Add X86ISD::VPERMIL2 demandedelts test
llvm-svn: 358166
2019-04-11 13:52:43 +00:00
Simon Pilgrim e468cc7f14 [X86] SimplifyDemandedVectorElts - add VPPERM support
We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage.

llvm-svn: 358165
2019-04-11 13:30:38 +00:00
Shiva Chen 7cc03bd064 [RISCV] Put data smaller than eight bytes to small data section
Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset
could covert most of the small data section. Linker relaxation could transfer
the multiple data accessing instructions to a gp base with signed twelve-bit
offset instruction.

Differential Revision: https://reviews.llvm.org/D57493

llvm-svn: 358150
2019-04-11 04:59:13 +00:00
Amara Emerson 213e0bde04 [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson a7ff111b04 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson ae878dab03 [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper 10048060f6 [X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC
llvm-svn: 358141
2019-04-10 22:35:32 +00:00
Craig Topper a3ee7e2b3e [X86] Autogenerate complete checks. NFC
llvm-svn: 358140
2019-04-10 22:35:24 +00:00
Craig Topper 61f31cbcb2 [X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.

This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.

This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.

Differential Revision: https://reviews.llvm.org/D60532

llvm-svn: 358139
2019-04-10 21:42:08 +00:00
David Green deb3342018 [ARM] Add an extra test for constant hoist. NFC
llvm-svn: 358128
2019-04-10 19:18:58 +00:00
Craig Topper cacb70c94b [X86] Add test case for LEA formation regression seen with D60358. NFC
If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the.
Assuming C1 is small enough and C2 masks off all of the extend bits.

This pattern showed up in D60358. And we need to handle it to prevent a regression.

llvm-svn: 358124
2019-04-10 19:09:06 +00:00
David Green 4e3fd7757a [ARM] Add an extra constant hoisting test. NFC
This adds a simple extra test for constant hoisting to show it's
usefulness with constant addresses like those seen in memory
mapped registers in embedded systems.

llvm-svn: 358114
2019-04-10 18:05:57 +00:00
David Green 0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Matt Arsenault 7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Matt Arsenault 9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Simon Pilgrim 37d8d55823 [X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST
llvm-svn: 358096
2019-04-10 16:24:47 +00:00
Diogo N. Sampaio aae424a2d2 [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Diogo N. Sampaio 651463e4a8 [ARM] [FIX] Add missing f16 vector operations lowering
Summary:
Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.

These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.

Reviewers: olista01, pbarrio, LukeGeeson, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60319

llvm-svn: 358081
2019-04-10 13:28:06 +00:00
Diana Picus b6e83b98f9 [ARM GlobalISel] Select G_FCONSTANT for VFP3
Make it possible to TableGen code for FCONSTS and FCONSTD.

We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.

There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.

llvm-svn: 358063
2019-04-10 09:14:32 +00:00
Diana Picus 3533ad6801 [ARM GlobalISel] Select G_FCONSTANT into pools
Put all floating point constants into constant pools and load their
values from there.

llvm-svn: 358062
2019-04-10 09:14:24 +00:00
Diana Picus 165846b031 [ARM GlobalISel] Map G_FCONSTANT
llvm-svn: 358061
2019-04-10 09:14:16 +00:00
Jim Lin a49c95e02a [Sparc] Fix incorrect MI insertion position for spilling f128.
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.

before fixed:

std %f0, [%g1+80]
sethi 4, %g1        <<<
add %g1, %sp, %g1   <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

after fixed:

sethi 4, %g1
add %g1, %sp, %g1
std %f0, [%g1+80]
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60397

llvm-svn: 358042
2019-04-10 01:56:32 +00:00
Amara Emerson 9bf092d719 [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson 888dd5d198 [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson 92d74f19cf [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson 2b523f8162 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Craig Topper ba55a40fd0 [AArch64] Add test case to show missed opportunity to remove a shift before tbnz when the shift has been zero extended from i32 to i64. NFC
This pattern showed up in D60358 and it was suggested I had a test and fix that separately.

llvm-svn: 358030
2019-04-09 19:23:37 +00:00
Craig Topper 61e77b11d1 [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.

Differential Revision: https://reviews.llvm.org/D60020

llvm-svn: 358027
2019-04-09 18:33:56 +00:00
Simon Pilgrim d7cc0ec581 [TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support
llvm-svn: 358019
2019-04-09 16:52:21 +00:00
Stanislav Mekhanoshin 913ba8eeb4 Revert LIS handling in MachineDCE
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.

Differential Revision: https://reviews.llvm.org/D60466

llvm-svn: 358015
2019-04-09 16:13:53 +00:00
Simon Pilgrim 345eacd555 [TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.

llvm-svn: 357992
2019-04-09 10:27:59 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Chen Zheng 19ce6719bc [PowerPC] initialize SchedModel according to platform.
Differential Revision: https://reviews.llvm.org/D60177

llvm-svn: 357962
2019-04-09 01:25:25 +00:00
Simon Pilgrim 9f74df7d5b [TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling
Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts.

llvm-svn: 357942
2019-04-08 20:59:38 +00:00
Simon Pilgrim 86844a865e [X86][AVX] Add PR34380 shuffle test cases
llvm-svn: 357914
2019-04-08 14:05:42 +00:00
Sanjay Patel 50c3b290ed [x86] make 8-bit shl undesirable
I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops.

We've hinted at making all 8-bit ops undesirable for the reason in the code comment:

// TODO: Almost no 8-bit ops are desirable because they have no actual
//       size/speed advantages vs. 32-bit ops, but they do have a major
//       potential disadvantage by causing partial register stalls.

...but that leads to massive diffs and exposes all kinds of optimization holes itself.

Differential Revision: https://reviews.llvm.org/D60286

llvm-svn: 357912
2019-04-08 13:58:50 +00:00
Craig Topper afb6b42691 [X86] Split floating point tests out of atomic-mi.ll into atomic-fp.ll. Add avx and avx512f command lines. NFC
llvm-svn: 357882
2019-04-08 01:54:27 +00:00
Craig Topper 8aeefe3149 [X86] Add avx and avx512f command lines to atomic-non-integer.ll. NFC
llvm-svn: 357881
2019-04-08 01:54:24 +00:00
Craig Topper 424417da79 [X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile.
Summary:
Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings.

This is similar to what we do in the loadi32 predicate.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60341

llvm-svn: 357875
2019-04-07 19:19:44 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Simon Pilgrim 07adb6abda [X86][SSE] SimplifyDemandedBitsForTargetNode - Add initial PACKSS support
In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands.

This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved.

Differential Revision: https://reviews.llvm.org/D60333

llvm-svn: 357859
2019-04-07 10:40:01 +00:00
Craig Topper 399102b464 [X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates.
llvm-svn: 357848
2019-04-06 19:00:11 +00:00
Craig Topper f9b9f8d2e4 [X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding.
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.

llvm-svn: 357846
2019-04-06 18:00:50 +00:00
Craig Topper 82448bc09e [X86] Add test cases to show missed opportunities to use a sign extended 8 or 32 bit immediate AND when reversing SHL+AND to form an LEA.
When we shift the AND mask over we should shift in sign bits instead of zero bits. The scale in the LEA will shift these bits out so it doesn't matter whether we mask the bits off or not. Using sign bits will potentially allow a sign extended immediate to be used.

Also add some other test cases for cases that are currently optimal.

llvm-svn: 357845
2019-04-06 18:00:45 +00:00
Craig Topper 9d7379c250 [X86] Autogenerate complete checks. NFC
llvm-svn: 357844
2019-04-06 18:00:41 +00:00
Simon Pilgrim ec28615f7f [X86] Add AVX-target expandload and compressstore tests
llvm-svn: 357842
2019-04-06 14:40:52 +00:00
Simon Pilgrim d23611f9ad [X86] Split expandload and compressstore tests
llvm-svn: 357840
2019-04-06 14:14:54 +00:00
Simon Pilgrim 18a8a64c9f [X86][SSE] Add more exhaustive masked load/store tests
Reordered/renamed some existing tests to match the cleaned up order

llvm-svn: 357839
2019-04-06 14:01:37 +00:00
Francis Visoiu Mistrih 9d9d1b6b2b [X86] Enable tail calls for CallingConv::Swift
It's currently only enabled on AArch64 (enabled in r281376).

llvm-svn: 357809
2019-04-05 20:18:25 +00:00
Francis Visoiu Mistrih ab051a378c [X86] Preserve operand flag when expanding TCRETURNri
The expansion of TCRETURNri(64) would not keep operand flags like
undef/renamable/etc. which can result in machine verifier issues.

Also add plumbing to be able to use `-run-pass=x86-pseudo`.

llvm-svn: 357808
2019-04-05 20:18:21 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Craig Topper 80aa2290fb [X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon

Reviewed By: RKSimon

Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60228

llvm-svn: 357802
2019-04-05 19:28:09 +00:00
Craig Topper 7323c2bf85 [X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri

Reviewed By: andreadb

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60138

llvm-svn: 357801
2019-04-05 19:27:49 +00:00
Craig Topper e0bfeb5f24 [X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate.
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.

This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.

Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.

This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.

I plan to make similar changes for SETcc and Jcc.

Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60041

llvm-svn: 357800
2019-04-05 19:27:41 +00:00
Clement Courbet 1d8c9dfe03 [ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case.
llvm-svn: 357767
2019-04-05 15:03:25 +00:00
Simon Pilgrim 17586cda4a [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D60006

llvm-svn: 357765
2019-04-05 14:56:21 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Sanjay Patel 50a8652785 [DAGCombiner][x86] scalarize splatted vector FP ops
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Differential Revision: https://reviews.llvm.org/D60150

llvm-svn: 357760
2019-04-05 13:32:17 +00:00
Simon Pilgrim faa5b939f0 [X86][AVX] Add PR34584 masked store test cases
llvm-svn: 357757
2019-04-05 11:34:30 +00:00
Simon Pilgrim 329e63b915 [X86] Add SSE/AVX1/AVX2 masked trunc+store tests
llvm-svn: 357756
2019-04-05 11:22:28 +00:00
Roger Ferrer Ibanez e011e4f89c [RISCV] Implement adding a displacement to a BlockAddress
Recent change rL357393 uses MachineInstrBuilder::addDisp to add a based on a
BlockAddress but this case was not implemented.

This patch adds the missing case and a test for RISC-V that exercises the new
case.

Differential Revision: https://reviews.llvm.org/D60136

llvm-svn: 357752
2019-04-05 08:40:57 +00:00
Piotr Sobczak 0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Craig Topper 94f1772b1e [X86] Promote i16 SRA instructions to i32
We already promote SRL and SHL to i32.

This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions.

I think there might be some DAG combine improvement opportunities in the test changes here.

Differential Revision: https://reviews.llvm.org/D60278

llvm-svn: 357743
2019-04-05 06:32:50 +00:00
Serguei Katkov c39636cc2c [FastISel] Fix crash for gc.relocate lowring
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.

To resolve this issue we just ignore dead gc.relocate in the check.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184

llvm-svn: 357742
2019-04-05 05:41:08 +00:00
James Y Knight a040174418 Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs
It unnecessarily breaks previously-working code which used varargs,
but didn't pass any float/double arguments (such as EDK2).

Also revert the fixup on top of that:
Revert [X86] Fix a test from r357317

This reverts r357317 (git commit d413f41de6)
This reverts r357380 (git commit 7af32444b9)

llvm-svn: 357718
2019-04-04 19:05:48 +00:00
Sam Clegg 2a7cac932b [WebAssembly] Add new explicit relocation types for PIC relocations
See https://github.com/WebAssembly/tool-conventions/pull/106

Differential Revision: https://reviews.llvm.org/D59907

llvm-svn: 357710
2019-04-04 17:43:50 +00:00
Sanjay Patel 17648b848e [x86] eliminate unnecessary broadcast of horizontal op
This is another pattern that comes up if we more aggressively
scalarize FP ops.

llvm-svn: 357703
2019-04-04 14:46:13 +00:00
Jonas Paulsson c56ffed304 [SystemZ] Bugfix in isFusableLoadOpStorePattern()
This function is responsible for checking the legality of fusing an instance
of load -> op -> store into a single operation. In the SystemZ backend the
check was incomplete and a test case emerged with a cycle in the instruction
selection DAG as a result.

Instead of using the NodeIds to determine node relationships,
hasPredecessorHelper() now is used just like in the X86 backend. This handled
the failing tests and as well gave a few additional transformations on
benchmarks.

The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86
function, and it seems this could be made a utility function in common code
instead.

Review: Ulrich Weigand
https://reviews.llvm.org/D60255

llvm-svn: 357688
2019-04-04 12:12:35 +00:00
Diana Picus 153c3887e4 [ARM GlobalISel] Support DBG_VALUE
Make sure we can map and select DBG_VALUE.

llvm-svn: 357681
2019-04-04 10:24:51 +00:00
Craig Topper 3649c20884 [X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel.
SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are
zero. But that isn't the case here. We've done no analysis of the inputs.

llvm-svn: 357673
2019-04-04 05:00:18 +00:00
Serguei Katkov fb44846e37 [FastISel] Fix the crash in gc.result lowering
The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction.
This works as follows:
Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel
for these instructions if it is a call and continue fast instruction selections.

However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining
instructions in basic block.

However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint
causing breakage invariant the gc.results should be handled after statepoint.

Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext)
and as a result test does not check fast-isel at all.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60182

llvm-svn: 357672
2019-04-04 04:19:56 +00:00
David L. Jones 8b8a02175a Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)'
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.

llvm-svn: 357667
2019-04-04 02:27:57 +00:00
Craig Topper 051bd16faf [X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead.
These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers.

This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes
that take the extra zeroes as inputs. The zeros will then go through isel on their own to select
the MOV32r0 pseudo. Then we just need to mention the physical registers directly
in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary
copies to/from physical registers.

llvm-svn: 357659
2019-04-04 00:28:49 +00:00
Craig Topper 52cac4b79f [X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead.
This custom inserter existed so we could do a weird thing where we pretended that the instructions support
a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer
size agnostic in the isel pattern.

To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after
isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV
instruction.

With this change we now just do custom selection during isel instead and just assign the incoming address
of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take
care of selecting an LEA or other operation to do any address computation needed in this basic block.

I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit
use is properly sized based on the pointer. Without this we get comments in the assembly output about
killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX.

llvm-svn: 357652
2019-04-03 23:28:30 +00:00
Craig Topper 477008bd50 [X86] Remove dead CHECK lines for a test. NFC
llvm-svn: 357651
2019-04-03 23:28:18 +00:00
Craig Topper 437b45a1f8 [X86] Autogenerate checks. NFC
llvm-svn: 357650
2019-04-03 23:28:11 +00:00
Sanjay Patel c9a012e4ea [x86] fold shuffles of h-ops that have an undef operand
If an operand is undef, we can assume it's the same as the
other operand.

llvm-svn: 357644
2019-04-03 22:40:35 +00:00
Sanjay Patel 61b5e3c6a9 [x86] eliminate movddup of horizontal op
This pattern would show up as a regression if we more
aggressively convert vector FP ops to scalar ops.

There's still a missed optimization for the v4f64 legal
case (AVX) because we create that h-op with an undef operand.
We should probably just duplicate the operands for that
pattern to avoid trouble.

llvm-svn: 357642
2019-04-03 22:15:29 +00:00
Sanjay Patel 0b874c7c60 [x86] add another test for disguised h-op; NFC
llvm-svn: 357636
2019-04-03 21:10:55 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Sanjay Patel 8c9ceecdc6 [x86] add test for disguised horizontal op; NFC
llvm-svn: 357630
2019-04-03 20:34:22 +00:00
Krzysztof Parzyszek 4841643a1d [X86] Extend boolean arguments to inline-asm according to getBooleanType
Differential Revision: https://reviews.llvm.org/D60208

llvm-svn: 357615
2019-04-03 17:43:14 +00:00
Simon Pilgrim 15919ad306 [X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1
Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result.

llvm-svn: 357611
2019-04-03 17:28:34 +00:00
Simon Pilgrim 9e28dddf55 [X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1
Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets.

llvm-svn: 357608
2019-04-03 17:17:13 +00:00
Jessica Paquette e794121cd0 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Sanjay Patel 8055034666 [x86] make stack folding tests immune to unrelated transforms; NFC
llvm-svn: 357604
2019-04-03 16:33:24 +00:00
Ulrich Weigand 35dfd1b7df [SystemZ] Improve codegen for certain SADDO-immediate cases
When performing an add-with-overflow with an immediate in the
range -2G ... -4G, code currently loads the immediate into a
register, which generally takes two instructions.

In this particular case, it is preferable to load the negated
immediate into a register instead, which always only requires
one instruction, and then perform a subtract.

llvm-svn: 357597
2019-04-03 15:09:19 +00:00
Sanjay Patel 281cf28329 [x86] remove duplicate tests
Accidentally double-committed these.

llvm-svn: 357593
2019-04-03 14:45:45 +00:00
Sanjay Patel 393458f3ed [x86] add negative tests for FP scalarization; NFC
These go with the proposal in D60150.

llvm-svn: 357592
2019-04-03 14:41:28 +00:00
Sanjay Patel 04848090cd [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357591
2019-04-03 14:41:24 +00:00
Sanjay Patel eb5ffc7842 [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357587
2019-04-03 14:36:47 +00:00
Petar Avramovic afa3afa384 [MIPS GlobalISel] Select floating point arithmetic operations
Select 32 and 64 bit floating point add, sub, mul and div for MIPS32.

Differential Revision: https://reviews.llvm.org/D60191

llvm-svn: 357584
2019-04-03 14:12:59 +00:00
Sanjay Patel 00dae6b22d [DAGCombiner] loosen restrictions for moving shuffles after vector binop
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580
2019-04-03 13:42:06 +00:00
Simon Pilgrim 143279e61f [X86] Regenerate LEA codegen tests
llvm-svn: 357573
2019-04-03 12:33:16 +00:00
Clement Courbet 26a8ed3ac9 [X86] Make the post machine scheduler macrofusion-aware.
Summary:
Given that X86 does not use this currently, this is an NFC. I'll
experiment with enabling and will report numbers.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60185

llvm-svn: 357568
2019-04-03 09:37:30 +00:00
Clement Courbet 5bfa946d69 [X86][NFC] Add tests for misched macro-fusion.
llvm-svn: 357565
2019-04-03 08:21:54 +00:00
Hans Wennborg 94b867dc7c Revert r357256 "[DAGCombine] Improve Lifetime node chains."
As it caused a pathological compile-time regressionin V8, see PR41352.

> Improve both start and end lifetime nodes chain dependencies.
>
> Reviewers: courbet
>
> Reviewed By: courbet
>
> Subscribers: hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D59795

This also reverts the follow-up r357309:

> [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
>
> Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357563
2019-04-03 07:41:58 +00:00
Chen Zheng 4178c15330 [PowerPC]add testcase for ppcctrloops pass shortloop check
llvm-svn: 357560
2019-04-03 03:11:34 +00:00
Matt Arsenault f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00
Craig Topper 16683a3ef8 [X86] Update the test case for v4i1 bitselect in combine-bitselect.ll to not have an infinite loop in IR.
In fact we don't even need a loop at all. I backed out the bug fix this was testing for and verified that this new case hit the same issue.

This should stop D59626 from deleting some of this code by realizing it was dead due to the loop.

llvm-svn: 357544
2019-04-03 00:05:03 +00:00
Craig Topper ca9eb68541 [X86] Autogenerate complete checks. NFC
llvm-svn: 357543
2019-04-03 00:04:57 +00:00
Matt Arsenault 2065680b47 AMDGPU: Don't use the default cpu in a few tests
Avoids unnecessary test changes in a future commit.

llvm-svn: 357539
2019-04-03 00:00:58 +00:00
Jessica Paquette ed23352379 [GlobalISel] Add IRTranslator support for llvm.stacksave and llvm.stackrestore
Also update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60140

llvm-svn: 357538
2019-04-02 22:46:31 +00:00
Stanislav Mekhanoshin ea2e227926 X86: regenerate speculative-load-hardening-indirect.ll tests. NFC.
llvm-svn: 357537
2019-04-02 22:44:46 +00:00
Sanjay Patel 8e6d41aeb2 [x86] add more tests for FP scalarization; NFC
llvm-svn: 357523
2019-04-02 20:24:06 +00:00
Jessica Paquette 22c6215c7e [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.

Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D60100

llvm-svn: 357518
2019-04-02 19:57:26 +00:00
Craig Topper 0d3a533270 [X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSize
This matches our usual INC/DEC heuristic used during isel.

llvm-svn: 357497
2019-04-02 17:13:03 +00:00
Jonas Paulsson f76fe45426 [SystemZ] Improve instruction selection of 64 bit shifts and rotates.
For shift and rotate instructions that only use the last 6 bits of the shift
amount, a shift amount of (x*64-s) can be substituted with (-s). This saves
one instruction and a register:

  lhi     %r1, 64
  sr      %r1, %r3
  sllg    %r2, %r2, 0(%r1)
  =>
  lcr     %r1, %r3
  sllg    %r2, %r2, 0(%r1)

Review: Ulrich Weigand
llvm-svn: 357481
2019-04-02 15:36:30 +00:00
Simon Atanasyan 2634a141fd [mips] Use AltOrders to prevent using odd FP-registers
To disable using of odd floating-point registers (O32 ABI and
-mno-odd-spreg command line option) such registers and their
super-registers added to the set of reserved registers. In general, it
works. But there is at least one problem - in case of enabled machine
verifier pass some floating-point tests failed because live ranges of
register units that are reserved is not empty and verification pass
failed with "Live segment doesn't end at a valid instruction" error
message.

There is D35985 patch which tries to solve the problem by explicit
removing of register units. This solution did not get approval.

I would like to use another approach for prevent using odd floating
point registers - define `AltOrders` and `AltOrderSelect` for MIPS
floating point register classes. Such `AltOrders` contains reduced set
of registers. At first glance, such solution does not break any test
cases and allows enabling machine instruction verification for all MIPS
test cases.

Differential Revision: http://reviews.llvm.org/D59799

llvm-svn: 357472
2019-04-02 13:57:32 +00:00
Simon Pilgrim 64bd87ad4b [X86][AVX] Add test case showing failure to fold broadcast load if its also used as a scalar
llvm-svn: 357465
2019-04-02 10:31:00 +00:00
Sander de Smalen 7f23e0a62f Enforce StackID definition in PEI
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.

Reviewers: arsenm, thegameg, MatzeB

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60062

llvm-svn: 357460
2019-04-02 09:46:52 +00:00
Hans Wennborg b669fea42f SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.

That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.

Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 357452
2019-04-02 08:01:38 +00:00
Craig Topper 536383a354 [X86] Add test cases to fixup-lea.ll for optsize and no size optimization. Add +/-slow-incdec command lines
We only form inc/dec in FixupLEAs under minsize today, but all other locations in the compiler for inc/dec with optsize.

llvm-svn: 357446
2019-04-02 00:54:22 +00:00
Craig Topper c133015975 [X86] Autogenerate complete checks. NFC
llvm-svn: 357445
2019-04-02 00:54:15 +00:00
Michael Liao 9bef688bc2 [AMDGPU] Add more test cases of D59608.
Summary: - Add more test cases.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60071

llvm-svn: 357442
2019-04-02 00:36:37 +00:00
Eli Friedman 3813fe0bda [ARM] Optimize expressions like "return x != 0;" for Thumb1.
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.

While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.

Differential Revision: https://reviews.llvm.org/D59616

llvm-svn: 357437
2019-04-02 00:01:23 +00:00
Eli Friedman 73af6ef2e7 [ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz.
It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length.  Not sure if that really makes sense, but I decided not to touch
it for now.

Differential Revision: https://reviews.llvm.org/D59385

llvm-svn: 357436
2019-04-01 23:55:57 +00:00
Jessica Paquette e44c20a68d [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.

Differential Revision: https://reviews.llvm.org/D60083

llvm-svn: 357432
2019-04-01 22:19:13 +00:00
Bixia Zheng 6c21ccd245 [NVPTX] Fix the codegen for llvm.round.
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.

Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.

Reviewers: tra

Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59947

llvm-svn: 357407
2019-04-01 16:10:26 +00:00
Neil Henning 0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
Alex Bradbury da20f5ca74 [RISCV] Generate address sequences suitable for mcmodel=medium
This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.

Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.

Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.

llvm-svn: 357393
2019-04-01 14:42:56 +00:00
Clement Courbet 7e062c9b1f [X86] Make post-ra scheduling macrofusion-aware.
Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59688

llvm-svn: 357384
2019-04-01 13:48:50 +00:00
Clement Courbet d9f6ee1c3c [X86MacroFusion][NFC] Add more tests.
In preparation for D59688.

llvm-svn: 357381
2019-04-01 13:18:34 +00:00
Krasimir Georgiev 7af32444b9 [X86] Fix a test from r357317
Summary:
The missing `<` causes the lld command to override the test file, which fails in
environments marking the test files as readonly.

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60060

llvm-svn: 357380
2019-04-01 11:42:54 +00:00
Simon Pilgrim e8c3136994 [X86][SSE] Add fcmp constant folding tests
Initial test coverage for D60006

llvm-svn: 357379
2019-04-01 10:54:04 +00:00
Luis Marques 3091884e25 [RISCV] Add seto pattern expansion
Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and 
`fcmp ord` would be inefficient due to an unoptimized double negation.

Differential Revision: https://reviews.llvm.org/D59699

llvm-svn: 357378
2019-04-01 09:54:14 +00:00
Sanjay Patel e1bc360fc6 [x86] allow movmsk with 2-element reductions
One motivation for making this change is that the lack of using movmsk is likely
a main source of perf difference between clang and gcc on the C-Ray benchmark as
shown here:
https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5
...but this change alone isn't enough to solve that problem.

The 'all-of' examples show what is likely the worst case trade-off: we end up with
an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of'
examples look clearly better using movmsk because we've traded 2 vector instructions
for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'.

If we examine the llvm-mca output for these cases, it appears that even though the
'all-of' movmsk variant looks worse on paper, it would perform better on both
Haswell and Jaguar.

  $ llvm-mca -mcpu=haswell no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      504
  Total uOps:        400

  Dispatch Width:    4
  uOps Per Cycle:    0.79
  IPC:               0.79
  Block RThroughput: 1.0

  $ llvm-mca -mcpu=haswell movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      358
  Total uOps:        600

  Dispatch Width:    4
  uOps Per Cycle:    1.68
  IPC:               1.68
  Block RThroughput: 1.5

  $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      407
  Total uOps:        400

  Dispatch Width:    2
  uOps Per Cycle:    0.98
  IPC:               0.98
  Block RThroughput: 2.0

  $ llvm-mca -mcpu=btver2 movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      311
  Total uOps:        600

  Dispatch Width:    2
  uOps Per Cycle:    1.93
  IPC:               1.93
  Block RThroughput: 3.0

Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if
that's true, then we're also almost certainly making the wrong transform already for
reductions with >2 elements, so that should be fixed independently.

Differential Revision: https://reviews.llvm.org/D59997

llvm-svn: 357367
2019-03-31 15:11:34 +00:00
Simon Pilgrim ec56621a5c [SystemZ] Remove fcmp undef from reduced test
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @uweigand (Ulrich Weigand)

llvm-svn: 357355
2019-03-30 20:24:26 +00:00
Simon Pilgrim 513e6b9d58 [MIPS] Remove fcmp undef from reduced test
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @atanasyan (Simon Atanasyan)

llvm-svn: 357354
2019-03-30 20:16:16 +00:00
Craig Topper e4a0fc7d75 [X86] Teach isel for RMW binops to handle negate
Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A)

Differential Revision: https://reviews.llvm.org/D60007

llvm-svn: 357353
2019-03-30 18:59:17 +00:00
Alex Bradbury 0b2803ee65 [RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs
This patch adds support for the RISC-V hard float ABIs, building on top of
rL355771, which added basic target-abi parsing and MC layer support. It also
builds on some re-organisations and expansion of the upstream ABI and calling
convention tests which were recently committed directly upstream.

A number of aspects of the RISC-V float hard float ABIs require frontend
support (e.g. flattening of structs and passing int+fp for fp+fp structs in a
pair of registers), and will be addressed in a Clang patch.

As can be seen from the tests, it would be worthwhile extending
RISCVMergeBaseOffsets to handle constant pool as well as global accesses.

Differential Revision: https://reviews.llvm.org/D59357

llvm-svn: 357352
2019-03-30 17:59:30 +00:00
Simon Pilgrim 10c9032c02 [X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316)
Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits.

llvm-svn: 357351
2019-03-30 17:12:29 +00:00
Alex Bradbury b5498cbf64 [RISCV] Add RV64 CHECK lines to test/CodeGen/RISCV/vararg.ll and prepare for hard float tests
vararg.ll previously missed RV64 tests. This patch also prepares for using
vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard
float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64
ABI. Due to some slight codegen differences, different check lines are needed
for when RV32D is enabled.

llvm-svn: 357350
2019-03-30 15:53:38 +00:00
Simon Pilgrim cfdf09ba7d [X86][SSE] Add PAVG test case from PR41316
llvm-svn: 357346
2019-03-30 13:53:11 +00:00
Heejin Ahn c4ac74fb49 [WebAssembly] Fix unwind destination mismatches in CFG stackify
Summary:
Linearing the control flow by placing `try`/`end_try` markers can create
mismatches in unwind destinations. This patch resolves these mismatches
by wrapping those instructions with an incorrect unwind destination with
a nested `try`/`catch`/`end_try` and branching to the right destination
within the new catch block.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D48345

llvm-svn: 357343
2019-03-30 11:04:48 +00:00
Heejin Ahn e9fd9073e4 [WebAssembly] Run ExplicitLocals pass after CFGStackify
Summary:
While this does not change any final output, this will greatly simplify
ixing unwind destination mismatches in CFGStackify (D48345), because we
have to create some new registers there.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59652

llvm-svn: 357342
2019-03-30 09:29:57 +00:00
Alex Bradbury 9681b01c21 [RISCV] Add DAGCombine for (SplitF64 (ConstantFP x))
The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.

llvm-svn: 357341
2019-03-30 09:15:47 +00:00
Alex Bradbury 98b8ecde64 [RISCV][NFC] Remove floating point operations from test/CodeGen/RISCV/vararg.ll
This minimises differences in output when compiling with hardware floating
point support, which will be done in a future patch (to demonstrate the same
vararg calling convention is used).

llvm-svn: 357339
2019-03-30 05:24:42 +00:00
Heejin Ahn 7e7aad1510 [WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG
Summary:
Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.

This also renames variable `Split` to `Routing`, which I think is a bit
clearer.

Reviewers: kripken

Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59462

llvm-svn: 357337
2019-03-30 01:31:11 +00:00
Thomas Lively 5f0c4c67bb [WebAssembly] Add mutable globals feature
Summary:
This feature is not actually used for anything in the WebAssembly
backend, but adding it allows users to get it into the target features
sections of their objects, which makes these objects
future-compatible.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60013

llvm-svn: 357321
2019-03-29 22:00:18 +00:00
Jessica Paquette d3ffd47df9 [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

llvm-svn: 357318
2019-03-29 21:39:36 +00:00
Amara Emerson d413f41de6 [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs
We need XMM registers to handle varargs with the Win64 ABI. Before we would
silently generate bad code resulting in an assertion failure elsewhere in the
backend.

llvm-svn: 357317
2019-03-29 21:30:51 +00:00
Heejin Ahn 67f74aceab [WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify
Summary:
This fixes crashes when a BB in which an END_LOOP is to be placed is
unreachable and does not have any predecessors. Fixes PR41307.

Reviewers: dschuff

Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60004

llvm-svn: 357303
2019-03-29 19:36:51 +00:00
Matt Arsenault 055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Simon Pilgrim d395bc1cc2 [Hexagon] Remove fcmp undef from reduced tests
Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @kparzysz (Krzysztof Parzyszek)

llvm-svn: 357301
2019-03-29 19:14:52 +00:00
Craig Topper 103fbbbfca [X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC
llvm-svn: 357300
2019-03-29 19:09:37 +00:00
Simon Pilgrim 759cbee744 [SystemZ] Regenerate double constant comparison test
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357295
2019-03-29 18:23:08 +00:00
Simon Pilgrim 05e2621342 [MIPS] Regenerate double constant comparison test
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357294
2019-03-29 18:22:18 +00:00
Simon Pilgrim a3fb3d5583 [ARM] Regenerate execute-only float comparison tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357293
2019-03-29 18:21:19 +00:00
Simon Pilgrim dee8a14389 [AArch64] Regenerate half precision tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357286
2019-03-29 17:46:06 +00:00
Nirav Dave fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Simon Pilgrim b4b98a528b [ARM] Regenerate vector comparison tests
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357281
2019-03-29 17:35:11 +00:00
Simon Pilgrim 4e00a93558 [X86] Fix some tests using fcmp with undef arguments
Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) 

llvm-svn: 357278
2019-03-29 17:20:27 +00:00
Simon Atanasyan f26f56d6d3 [mips] Fix lowering a signed immediate for *.d MSA instructions
The `lowerMSASplatImm` function zero-extends `i32` immediates while
building constant. If target type is `i64`, negative immediate loses
the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered
to series of instruction loads incorrect value 0xffffffff to the `$w0`
register instead of single `ldi.d $w0, -1` instruction.

The fix zero-extends unsigned immediates and signed-extend signed
immediates.

Differential Revision: http://reviews.llvm.org/D59884

llvm-svn: 357264
2019-03-29 15:15:22 +00:00
Sanjay Patel 12685d0f7c [DAGCombiner] simplify shuffle of shuffle
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).

We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).

It looks like we miss this pattern in IR too.

In one of the zext examples here, we have shuffle masks like this:

Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>

...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.

Differential Revision: https://reviews.llvm.org/D59961

llvm-svn: 357258
2019-03-29 14:20:38 +00:00
Nirav Dave 9259de217e [DAGCombine] Improve Lifetime node chains.
Improve both start and end lifetime nodes chain dependencies.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59795

llvm-svn: 357256
2019-03-29 14:09:47 +00:00
Sanjay Patel 665a385035 [DAGCombiner] fold sext into decrement
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.

  %z = zext i8 %x to i32
  %dec = add i32 %z, -1
  %r = sext i32 %dec to i64
  =>
  %z2 = zext i8 %x to i64
  %r = add i64 %z2, -1

https://rise4fun.com/Alive/kPP

The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.

But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.

llvm-svn: 357254
2019-03-29 13:49:08 +00:00
Hans Wennborg 800b12f90a Switch lowering: exploit unreachable fall-through when lowering case range cluster
In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.

  switch i32 %i, label %default [
    i32 1,  label %bb1
    i32 2,  label %bb1
    i32 3,  label %bb1
    i32 4,  label %bb2
    i32 5,  label %bb2
    i32 6,  label %bb2
  ]
  default: unreachable

llvm-svn: 357252
2019-03-29 13:40:05 +00:00
Sanjay Patel 881bcbe094 [x86] add tests for decrement+sext; NFC
llvm-svn: 357251
2019-03-29 13:34:48 +00:00
Konstantin Zhuravlyov 2b766ed774 AMDGPU: Make sram-ecc off by default for Vega20
Differential Revision: https://reviews.llvm.org/D59718

llvm-svn: 357247
2019-03-29 12:04:18 +00:00
Simon Pilgrim aeaf7fcdde [X86] Add X86TargetLowering::isCommutativeBinOp override.
We currently just have test coverage for PMULUDQ - will add more in the future.

llvm-svn: 357244
2019-03-29 11:25:58 +00:00
Kang Zhang 05f78b35ae [PowerPC] Add the support for __builtin_setrnd()
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D59405

llvm-svn: 357241
2019-03-29 08:45:24 +00:00
Matt Arsenault 5fddf09187 AMDGPU/GlobalISel: Insert waterfall loop for vector indexing
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.

llvm-svn: 357235
2019-03-29 03:54:56 +00:00
Zi Xuan Wu 1445b77e8c [PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place
A shift and add/sub sequence combination is faster in place of a multiply by constant. 
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.

```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```

And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation. 
Also data type is considered for different cycles or latency to do multiply.

Differential Revision: https://reviews.llvm.org/D58950

llvm-svn: 357233
2019-03-29 03:08:39 +00:00
Thomas Lively 3f34e1b883 [WebAssembly] Merge used feature sets, update atomics linkage policy
Summary:
It does not currently make sense to use WebAssembly features in some functions
but not others, so this CL adds an IR pass that takes the union of all used
feature sets and applies it to each function in the module. This allows us to
prevent atomics from being lowered away if some function has opted in to using
them. When atomics is not enabled anywhere, we detect whether there exists any
atomic operations or thread local storage that would be stripped and disallow
linking with objects that contain atomics if and only if atomics or tls are
stripped. When atomics is enabled, mark it as used but do not require it of
other objects in the link. These changes allow libraries that do not use atomics
to be built once and linked into both single-threaded and multithreaded
binaries.

Reviewers: aheejin, sbc100, dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59625

llvm-svn: 357226
2019-03-29 00:14:01 +00:00
Yonghong Song 360a4e2ca6 [BPF] add proper multi-dimensional array support
For multi-dimensional array like below
  int a[2][3];
the previous implementation generates BTF_KIND_ARRAY type
like below:
  . element_type: int
  . index_type: unsigned int
  . number of elements: 6

This is not the best way to represent arrays, esp.,
when converting BTF back to headers and users will see
  int a[6];
instead.

This patch generates proper support for multi-dimensional arrays.
For "int a[2][3]", the two BTF_KIND_ARRAY types will be
generated:
  Type #n:
    . element_type: int
    . index_type: unsigned int
    . number of elements: 3
  Type #(n+1):
    . element_type: #n
    . index_type: unsigned int
    . number of elements: 2

The linux kernel already supports such a multi-dimensional
array representation properly.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D59943

llvm-svn: 357215
2019-03-28 21:59:49 +00:00
Craig Topper c25c9b4d16 [X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate
For 64-bit operations we should consider if the immediate can be made to fit
in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate
with MOV32ri instead of movabsq. For AND this allows us to fold the immediate.

Differential Revision: https://reviews.llvm.org/D59867

llvm-svn: 357196
2019-03-28 18:05:37 +00:00
Petar Avramovic 1af05df3de [MIPS GlobalISel] Select float constants
Select 32 and 64 bit float constants for MIPS32.

Differential Revision: https://reviews.llvm.org/D59933

llvm-svn: 357183
2019-03-28 16:58:12 +00:00
Sanjay Patel ffa8d3def7 [DAGCombiner] fold sext into negation
As noted in D59818:
  %z = zext i8 %x to i32
  %neg = sub i32 0, %z
  %r = sext i32 %neg to i64
  =>
  %z2 = zext i8 %x to i64
  %r = sub i64 0, %z2

https://rise4fun.com/Alive/KzSR

llvm-svn: 357178
2019-03-28 15:46:02 +00:00
Sanjay Patel e781528278 [x86] add vector test for sext of negate; NFC
llvm-svn: 357177
2019-03-28 15:30:09 +00:00
Sanjay Patel 5bbf6f0bd8 [x86] avoid cmov in movmsk reduction
This is probably the least important of our movmsk problems, but I'm starting
at the bottom to reduce distractions.

We were creating a select_cc which bypasses the select and bitmask codegen
optimizations that we have now. If we produce a compare+negate instead, we
allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov.
There's no partial register update danger in these sequences because we always
produce the zero-register xor ahead of the 'set' if needed.

There seems to be a missing fold for sext of a bool bit here:

negl %ecx
movslq %ecx, %rax

...but that's an independent transform.

Differential Revision: https://reviews.llvm.org/D59818

llvm-svn: 357172
2019-03-28 14:16:13 +00:00
Clement Courbet 699dc025a6 [X86MacroFusion] Handle branch fusion (AMD CPUs).
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.

See D59688 for context.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59872

llvm-svn: 357171
2019-03-28 14:12:46 +00:00
Matt Arsenault a353fd572a AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

llvm-svn: 357170
2019-03-28 14:01:39 +00:00
Simon Pilgrim 38a0616c1d [DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.

Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.

Differential Revision: https://reviews.llvm.org/D59654

llvm-svn: 357161
2019-03-28 11:34:21 +00:00
Diana Picus 13ef0c5309 [ARM GlobalISel] Run regbankselect test for Thumb. NFCI
This should just work, since ARM mode and Thumb2 mode are at the same
level of support now and should map the same to GPR and FPR.

llvm-svn: 357159
2019-03-28 10:57:29 +00:00
Simon Pilgrim 22be913ac0 [X85][AVX] Add missing vXi16 broadcast fold patterns
Now that D59484 has landed its easier to add these.

Added missing AVX512BW v32i16 equivalents while I was at it.

llvm-svn: 357155
2019-03-28 10:25:13 +00:00
Diana Picus 52495c472f [ARM GlobalISel] Fix G_STORE with s1
G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
Zero out the undefined bits before writing.

llvm-svn: 357154
2019-03-28 09:09:36 +00:00
Diana Picus 4d512df300 [ARM GlobalISel] Fix selection of G_SELECT
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.

llvm-svn: 357153
2019-03-28 09:09:27 +00:00
Piotr Sobczak f896785cb7 [SelectionDAG] Add 2 tests for selection across basic blocks
Summary:
Add tests for selection across basic block boundary:
 * one test containing a buffer load, where part of the offset
   computation is placed in the predecessor of the load
 * similar test, but containing two buffer loads and shared
   computations

Please note that the behaviour being tested will be updated in
a subsequent commit.

This commit was extracted from https://reviews.llvm.org/D59535.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59690

llvm-svn: 357149
2019-03-28 07:06:26 +00:00
Craig Topper 929932954d [X86] Add test cases from PR27202.
llvm-svn: 357132
2019-03-27 23:12:19 +00:00
Sanjay Patel 1df0bb6264 [x86] improve AVX lowering of vector zext
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.

I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.

From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.

Differential Revision: https://reviews.llvm.org/D59777

llvm-svn: 357129
2019-03-27 22:42:11 +00:00
Daniel Sanders 495156dc6a test/CodeGen/X86/codegen-prepare-replacephi.mir requires a default triple
llvm-svn: 357122
2019-03-27 20:43:47 +00:00
Nirav Dave 6b741a8038 [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes
Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations.

Reviewers: courbet, jyknight

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59897

llvm-svn: 357121
2019-03-27 20:37:08 +00:00
Justin Bogner b1650f0da9 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)

This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.

Patch by Guillaume Marques. Thanks!

Differential Revision: https://reviews.llvm.org/D56201

llvm-svn: 357120
2019-03-27 20:35:56 +00:00
Nirav Dave c6dfaa0e83 Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."
This patch appears to trigger very large compile time increases in
halide builds.

llvm-svn: 357116
2019-03-27 19:54:41 +00:00
Eli Friedman c388bfa230 [ARM] Don't confuse the scheduler for very large VLDMDIA etc.
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations.  This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .

Differential Revision: https://reviews.llvm.org/D59834

llvm-svn: 357109
2019-03-27 18:33:30 +00:00
Matt Arsenault 2e9ddcc30e RegPressure: Fix crash on blocks with only dbg_value
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.

llvm-svn: 357105
2019-03-27 18:14:02 +00:00
Amara Emerson 381188f1f3 [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).

This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.

As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.

Differential Revision: https://reviews.llvm.org/D59892

llvm-svn: 357101
2019-03-27 17:47:42 +00:00
Matt Arsenault 7b14b2425d Reapply "AMDGPU: Scavenge register instead of findUnusedReg"
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.

This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.

llvm-svn: 357098
2019-03-27 17:31:29 +00:00
Matt Arsenault 86e4fc0504 AMDGPU: Add testcase I meant to merge into r357093
llvm-svn: 357097
2019-03-27 17:31:26 +00:00
Craig Topper 7c9afc35bc [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.

When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.

This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.

Fixes PR41055

Differential Revision: https://reviews.llvm.org/D59391

llvm-svn: 357096
2019-03-27 17:29:34 +00:00
Quentin Colombet 89daf49e5c [PeepholeOpt] Don't stop simplifying copies on sequence of subregs
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).

Differential Revision: https://reviews.llvm.org/D59891

llvm-svn: 357095
2019-03-27 17:27:56 +00:00
Matt Arsenault 17e39100a2 AMDGPU: Enable the scavenger for large frames
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.

llvm-svn: 357093
2019-03-27 17:14:32 +00:00