Commit Graph

28659 Commits

Author SHA1 Message Date
Nicolai Haehnle 7edae4c403 AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.

This then used to trigger an assertion when processing a dependent
phi instruction.

Change-Id: Id4949719f8298062fe476a25718acccc109113b6

Reviewers: llvm-commits

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60999

llvm-svn: 358983
2019-04-23 13:12:52 +00:00
David Green c519d3c403 [ARM] Update check for CBZ in Ifcvt
The check for creating CBZ in constant island pass recently obtained the
ability to search backwards to find a Cmp instruction. The code in IfCvt should
mirror this to allow more conversions to the smaller form. The common code has
been pulled out into a separate function to be shared between the two places.

Differential Revision: https://reviews.llvm.org/D60090

llvm-svn: 358977
2019-04-23 12:11:26 +00:00
David Green 2f9eed6265 [ARM] Don't replicate instructions in Ifcvt at minsize
Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.

Differential Revision: https://reviews.llvm.org/D60089

llvm-svn: 358974
2019-04-23 11:46:58 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Javed Absar 1cdc3dbc58 [AArch64] Add support for MTE intrinsics
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486

llvm-svn: 358963
2019-04-23 09:39:58 +00:00
Diogo N. Sampaio 2619f399f9 [ARM][FIX] Add missing f16.lane.vldN/vstN lowering
Summary:
Add missing D and Q lane VLDSTLane lowering
for fp16 elements.

Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60874

llvm-svn: 358962
2019-04-23 09:36:39 +00:00
David Green 63a2aa715a [LSR] Limit the recursion for setup cost
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.

Thanks to @reames for the report and test case.

Differential Revision: https://reviews.llvm.org/D60944

llvm-svn: 358958
2019-04-23 08:52:21 +00:00
Sanjay Patel bf8aacb715 [SelectionDAG] move splat util functions up from x86 lowering
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.

There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().

llvm-svn: 358930
2019-04-22 22:43:36 +00:00
Douglas Yung 16df7e086b Relax test to check for a valid number instead of a specific number.
llvm-svn: 358926
2019-04-22 22:31:57 +00:00
Michael Liao 389d5a3474 [AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
  packed literals. Even an instruction is `packed`, it may have operand
  requiring non-packed literal, such as `v_dot2_f32_f16`.

Reviewers: rampitec, arsenm, kzhuravl

Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60978

llvm-svn: 358922
2019-04-22 22:05:49 +00:00
Matt Arsenault f84ce75cd1 AMDGPU: Skip debug instructions in assert
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address

llvm-svn: 358909
2019-04-22 19:14:26 +00:00
Matt Arsenault 2b6f76f05f AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources
llvm-svn: 358894
2019-04-22 15:22:46 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Simon Pilgrim ffd67233d4 [AMDGPU] Regenerate uitofp i8 to float conversion tests.
Prep work for D60462

llvm-svn: 358879
2019-04-22 10:19:09 +00:00
Sanjay Patel 7fa3a0eec9 [AArch64] add tests with multiple binop+splat vals; NFC
See D60890 for context.

llvm-svn: 358853
2019-04-21 15:01:19 +00:00
Craig Topper df02beb416 [X86] Add the rounding control operand to the printing for some scalar FMA instructions.
llvm-svn: 358844
2019-04-21 07:12:56 +00:00
Amara Emerson 4286652556 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Craig Topper 3980d1ca6b [X86] Disable argument copy elision for arguments passed via pointers
Summary:
If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory.

The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too.

This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix.

Reviewers: rnk

Reviewed By: rnk

Subscribers: dbabokin, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60801

llvm-svn: 358817
2019-04-20 15:26:44 +00:00
Nikita Popov b75c8fc6fb [X86] Fix stack probing on x32 (PR41477)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI
with stack probing a dynamic alloca will result in a WIN_ALLOCA_32
with a 32-bit size. The current implementation tries to copy it into
RAX, resulting in a physreg copy error. Fix this by copying to EAX
instead. Also fix incorrect opcodes or registers used in subs.

llvm-svn: 358807
2019-04-20 07:25:46 +00:00
Craig Topper 4d4b5d952e [X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the original AND can represented by MOVZX.
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.

llvm-svn: 358805
2019-04-20 04:38:53 +00:00
Craig Topper 8b8264828c [X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), (C1 >> C2), C2) if the AND could match a movzx.
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.

llvm-svn: 358804
2019-04-20 04:38:49 +00:00
Amara Emerson eac69e9377 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette d5c69e0836 [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Sam Clegg a27252794e [WebAssembly] FastISel: Don't fallback to SelectionDAG after BuildMI in selectCall
My understanding is that once BuildMI has been called we can't fallback
to SelectionDAG.

This change moves the fallback for when getRegForValue() fails for
that target of an indirect call.  This was failing in -fPIC mode when
the callee is GlobalValue.

Add a test case that tickles this.

Differential Revision: https://reviews.llvm.org/D60908

llvm-svn: 358793
2019-04-19 22:43:32 +00:00
Jessica Paquette ad69af3e95 [GlobalISel] Add IRTranslator support for G_FRINT
Add it as a simple intrinsic, update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60893

llvm-svn: 358787
2019-04-19 21:46:12 +00:00
Jessica Paquette 627e8f8cb3 [GlobalISel] Add a G_FRINT opcode
Equivalent to SelectionDAG's frint node.

Differential Revision: https://reviews.llvm.org/D60891

llvm-svn: 358785
2019-04-19 21:44:16 +00:00
Craig Topper f919a2ece4 [X86] Add test case for D60801. NFC
llvm-svn: 358784
2019-04-19 21:39:16 +00:00
Amy Huang c774f687b6 [MS] Emit S_HEAPALLOCSITE debug info
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: hans, rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60800

llvm-svn: 358783
2019-04-19 21:09:11 +00:00
Amara Emerson 36c5baef49 Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"
This introduces some runtime failures which I'll need to investigate further.

llvm-svn: 358771
2019-04-19 17:42:13 +00:00
Jessica Paquette dfd87f6fa1 [GlobalISel][AArch64] Legalize vector G_FPOW
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.

Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.

Differential Revision: https://reviews.llvm.org/D60218

llvm-svn: 358764
2019-04-19 16:28:08 +00:00
Sanjay Patel e197c617a6 [SelectionDAG] soften splat mask assert/unreachable (PR41535)
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.

llvm-svn: 358761
2019-04-19 15:31:11 +00:00
Simon Pilgrim 4c09b7d921 [AMDGPU] Regenerate extractelt->truncate test.
Prep work for D60462

llvm-svn: 358746
2019-04-19 09:49:04 +00:00
Craig Topper bb769a2946 [X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the AND could match a movzx.
Could get further improvements by recognizing (i64 and (anyext (i32 shl))).

llvm-svn: 358737
2019-04-19 05:48:13 +00:00
Craig Topper 2099ccbe1f [X86] Add test cases for turning (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) when the AND could match to a movzx.
We already reorder when C1 >> C2 would allow a smaller immediate encoding.

llvm-svn: 358736
2019-04-19 05:48:09 +00:00
Sanjay Patel 5d281ac9ce [AArch64] add tests for mul-by-element; NFC
llvm-svn: 358718
2019-04-18 21:48:46 +00:00
Jessica Paquette 0aa9b453c4 [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.

We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.

This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.

Differential Revision: https://reviews.llvm.org/D60881

llvm-svn: 358715
2019-04-18 21:15:48 +00:00
Jessica Paquette 3b5119c684 [GlobalISel][AArch64] Legalize v8s8 loads
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.

Differential Revision: https://reviews.llvm.org/D60877

llvm-svn: 358714
2019-04-18 21:13:58 +00:00
Simon Pilgrim 4171a91e92 [X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask
combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.

This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).

This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.

My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.

Differential Revision: https://reviews.llvm.org/D60375#inline-539623

llvm-svn: 358692
2019-04-18 17:23:09 +00:00
Sanjay Patel 51fa60bcbb [x86] add tests for improved insertelement to index 0 (PR41512); NFC
Patch proposal in D60852.

llvm-svn: 358687
2019-04-18 16:58:50 +00:00
Simon Pilgrim 8f87e53462 [X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is power-of-2.
This replaces the MOVMSK combine introduced at D52121/rL342326

(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))

with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.

Differential Revision: https://reviews.llvm.org/D60625

llvm-svn: 358651
2019-04-18 09:58:59 +00:00
Kang Zhang 009a21d2fd [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177

When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.

This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
 `getScalarSizeInBits() / 8`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60811

llvm-svn: 358644
2019-04-18 07:24:15 +00:00
Tim Renouf 7c55c8d8c3 [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.

Differential Revision: https://reviews.llvm.org/D60633

Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
2019-04-18 05:27:01 +00:00
Sanjay Patel fb363a778f [x86] try to widen 'shl' as part of LEA formation
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ

%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0

...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).

We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.

Differential Revision: https://reviews.llvm.org/D60789

llvm-svn: 358622
2019-04-17 22:38:51 +00:00
Amara Emerson daf6e66ac5 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00
Nick Desaulniers a2077bab40 [AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.

Reviewers: peter.smith, echristo

Reviewed By: echristo

Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60803

llvm-svn: 358603
2019-04-17 18:22:48 +00:00
Sanjay Patel 1964962b49 [ARM] tighten test checks; NFC
llvm-svn: 358594
2019-04-17 16:51:09 +00:00
Rhys Perry c2814e12e7 AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592
2019-04-17 16:31:52 +00:00
Sanjay Patel 1f2c81af72 [ARM] make test checks more thorough; NFC
This will change with the proposal in D60214.
Unfortunately, the triple is not supported for auto-generation
via script, and the multiple RUN lines have diffs on this test,
but I can't tell exactly what is required by this test.
PR7162 was an assert/crash, so hopefully, this is good enough.

llvm-svn: 358587
2019-04-17 16:02:07 +00:00
Craig Topper 5ca2e04c7a [X86] Autogenerate complete checks. NFC
llvm-svn: 358556
2019-04-17 06:09:16 +00:00
Sanjay Patel d5bc5ca3e4 [x86] adjust LEA tests for better coverage; NFC
The scale can 1, 2, or 3.

llvm-svn: 358539
2019-04-16 23:10:41 +00:00
Simon Pilgrim d769bb1e58 [X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.

This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.

Differential Revision: https://reviews.llvm.org/D60562

llvm-svn: 358516
2019-04-16 19:18:53 +00:00
Sanjay Patel f136c46bd6 [x86] add more tests for LEA formation; NFC
Promoting the shift to the wider type should allow LEA.

llvm-svn: 358513
2019-04-16 18:58:03 +00:00
Luis Marques 20d2424016 [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.

Differential Revision: https://reviews.llvm.org/D59477

llvm-svn: 358498
2019-04-16 14:38:32 +00:00
Hans Wennborg 21eb771dcb Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483
2019-04-16 12:13:25 +00:00
Amara Emerson 02a90ea73d [AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.

llvm-svn: 358458
2019-04-15 22:34:08 +00:00
Craig Topper 0495f29e42 [X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a 512 type.
The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a

llvm-svn: 358450
2019-04-15 21:06:32 +00:00
Craig Topper 77439bb128 [X86] Fix a stack folding test to have a full xmm2-31 clobber list instead of stopping at xmm15. Add an additional dependency to keep instruction below inline asm block.
llvm-svn: 358449
2019-04-15 21:06:23 +00:00
Matt Arsenault 101abd219b AMDGPU: Fix unreachable when counting register usage of SGPR96
llvm-svn: 358447
2019-04-15 20:51:12 +00:00
Matt Arsenault fbdd2a1887 AMDGPU: Fix printed format of SReg_96
These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446
2019-04-15 20:42:18 +00:00
Craig Topper 3d9b47c770 [X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without avx512bw.
32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash.

llvm-svn: 358436
2019-04-15 18:39:45 +00:00
Sanjay Patel 8ae68f2648 [x86] update test checks; NFC
llvm-svn: 358432
2019-04-15 17:38:47 +00:00
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Yevgeny Rouban 3992e9d229 Codegen: Fixed perf branch_weights in couple of tests. NFC.
This is need to pass future checks of perf branch_weights metadata.

llvm-svn: 358384
2019-04-15 09:30:31 +00:00
Bjorn Pettersson 60569363a5 [SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.

This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.

Reviewers: spatel, RKSimon, nikic, lebedev.ri

Reviewed By: nikic

Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60460

llvm-svn: 358372
2019-04-15 07:19:11 +00:00
Craig Topper abd87ff48b [X86] Regenerate checks for domain-reassignment.mir
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.

llvm-svn: 358371
2019-04-15 05:22:47 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Craig Topper 3c57976447 [X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG.
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.

This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.

The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.

We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.

There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.

llvm-svn: 358359
2019-04-14 18:26:11 +00:00
Craig Topper b17e5ec61b [X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use.
We're better of emitting a single compare + kand rather than a compare for the
other use and a masked compare.

I'm looking into using custom instruction selection for VPTESTM to reduce the
ridiculous number of permutations of patterns in the isel table. Putting a one
use check on all masked compare folding makes load fold matching in the custom
code easier.

llvm-svn: 358358
2019-04-14 18:26:06 +00:00
Craig Topper 476dd06854 [X86] Update bool_reduction_v8f32 test cases from vector-compare-any_of.ll and vector-compare-all_of.ll to be proper reductions.
One of the shuffles was used twice. While the intended shuffle wasn't connected.

llvm-svn: 358346
2019-04-14 04:20:42 +00:00
Bill Wendling 191f1487b6 [X86] Use PC-relative mode for the kernel code model
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits, kees, nickdesaulniers

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60643

llvm-svn: 358343
2019-04-13 21:39:28 +00:00
Amara Emerson 93e58d2396 [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash.
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.

Differential Revision: https://reviews.llvm.org/D60579

llvm-svn: 358318
2019-04-13 00:33:25 +00:00
Amara Emerson bdb5e4e4ca [GlobalISel] Fix a crash when handling an invalid MVT during call lowering.
This crash was introduced in r358032 as we try to construct an EVT from an MVT
in order to find the register type for the calling conv. Fall back instead of
trying to do this with an invalid MVT coming from i256.

llvm-svn: 358314
2019-04-12 22:05:46 +00:00
Amara Emerson 2806fd01a1 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Thomas Lively 9e27514996 [WebAssembly] Add mutable-globals to bleeding-edge CPU
Summary: This brings the backend in line with Clang.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60594

llvm-svn: 358310
2019-04-12 20:39:53 +00:00
Brendon Cahoon 4df216cd62 [Hexagon] Fix reuse bug in Vector Loop Carried Reuse pass
The Hexagon Vector Loop Carried Reuse pass was allowing reuse between
two shufflevectors with different masks. The reason is that the masks
are not instruction objects, so the code that checks each operand
just skipped over the operands.

This patch fixes the bug by checking if the operands are the same
when they are not instruction objects. If the objects are not the
same, then the code assumes that reuse cannot occur.

Differential Revision: https://reviews.llvm.org/D60019

llvm-svn: 358292
2019-04-12 16:37:12 +00:00
Sanjay Patel 5e4ad39af7 [DAGCombiner] narrow shuffle of concatenated vectors
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291
2019-04-12 16:31:56 +00:00
Simon Pilgrim 6c8f4ada36 [X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns
Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions.

This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result.

This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation.

Differential Revision: https://reviews.llvm.org/D60610

llvm-svn: 358286
2019-04-12 14:22:57 +00:00
Hans Wennborg 4e6b857922 Revert r358268 "[DebugInfo] DW_OP_deref_size in PrologEpilogInserter."
It causes clang to crash while building Chromium. See https://crbug.com/952230
for reproducer.

> The PrologEpilogInserter need to insert a DW_OP_deref_size before
> prepending a memory location expression to an already implicit
> expression to avoid having the existing expression act on the memory
> address instead of the value behind it.
>
> The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
> big-endian targets need to read the right size as simply truncating a
> larger read would yield the wrong result (LSB bytes are not at the lower
> address).
>
> Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358281
2019-04-12 12:54:52 +00:00
Kang Zhang 2446f843ae [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358271
2019-04-12 09:59:40 +00:00
Markus Lavin 138c76129b [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 358268
2019-04-12 08:23:55 +00:00
Eric Christopher b6926bdcff Revert "[PowerPC] Add initialization for some ppc passes"
This reverts commit 6f8f98ce8d as it
is breaking nearly every bot.

llvm-svn: 358260
2019-04-12 07:16:58 +00:00
Craig Topper 3b1239d2a8 [TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes.
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.

Differential Revision: https://reviews.llvm.org/D60358

llvm-svn: 358257
2019-04-12 06:49:28 +00:00
Kang Zhang 6f8f98ce8d [PowerPC] Add initialization for some ppc passes
Summary:

Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D60248

llvm-svn: 358256
2019-04-12 06:35:15 +00:00
Zi Xuan Wu ac79ef8f0e [PowerPC] More precise exploitation of P9 maddld instruction when operands are constant
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to 
materialize the operand, and one more extra register too. So it's not 
profitable to use maddld to optimize mul-add pattern.

Differential Revision: https://reviews.llvm.org/D60181

llvm-svn: 358253
2019-04-12 05:21:31 +00:00
Brendon Cahoon 57c3d4bed3 [Pipeliner] Fix incorrect loop carried dependence calculation
The isLoopCarriedDep function does not correctly compute loop
carried dependences when the array index offset is negative
or the stride is smallar than the access size.

Patch by Denis Antrushin.

Differential Revision: https://reviews.llvm.org/D60135

llvm-svn: 358233
2019-04-11 21:57:51 +00:00
Amara Emerson 7e9355f870 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson b956051415 [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Aaron Smith 994023a3f1 [DebugInfo] Combine Trivial and NonTrivial flags
Summary:
Companion to https://reviews.llvm.org/D59347


Reviewers: rnk, zturner, probinson, dblaikie, deadalnix

Subscribers: aprantl, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59348

llvm-svn: 358220
2019-04-11 20:25:10 +00:00
Craig Topper 68a5d619a4 [X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1.
If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1.

The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand.

llvm-svn: 358218
2019-04-11 19:57:44 +00:00
Craig Topper a3635b94c4 [X86] Add 32-bit command line to extractelement-fp.ll so I can add a test case for a 32-bit only crasher. NFC
This is a bit ugly for ABI reasons about how floats/doubles are returned.

llvm-svn: 358217
2019-04-11 19:57:24 +00:00
Craig Topper 586fad50ac [X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead.
This patch adds patterns for turning bitcasted atomic load/store into movss/sd.

It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part.

This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled.

Differential Revision: https://reviews.llvm.org/D60394

llvm-svn: 358215
2019-04-11 19:19:52 +00:00
Craig Topper f7e548c076 Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
With correct test checks this time.

If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ

This matches what gcc and icc do for this case and removes an existing FIXME.

llvm-svn: 358214
2019-04-11 19:19:42 +00:00
Craig Topper 8200880c9a Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
I seem to have messed up the test checks.

llvm-svn: 358212
2019-04-11 19:04:38 +00:00
Craig Topper 1c2dfc3100 [X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness.

This matches what gcc and icc do for this case and removes an existing FIXME.

Differential Revision: https://reviews.llvm.org/D60156

llvm-svn: 358211
2019-04-11 18:40:21 +00:00
Craig Topper 1fe5a9963d [X86] Pre-commit i64 volatile test case for D60156. NFC
llvm-svn: 358210
2019-04-11 18:40:08 +00:00
Simon Pilgrim 40b647ae8e [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support
Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 

llvm-svn: 358186
2019-04-11 15:29:15 +00:00
Simon Pilgrim a41275a398 [X86][AVX] Tweak X86ISD::VPERMV3 demandedelts test
Original test was too dependent on the order of the combines that could cause the inserted element being demanded after all

llvm-svn: 358182
2019-04-11 15:09:03 +00:00
Simon Pilgrim 34686b6e97 [X86][AVX] Add X86ISD::VPERMV3 demandedelts test
llvm-svn: 358175
2019-04-11 14:48:46 +00:00
Simon Pilgrim 8a25154fa7 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support
llvm-svn: 358174
2019-04-11 14:35:45 +00:00
Simon Pilgrim b237b54c2d [X86][AVX] Add X86ISD::VPERMV demandedelts test
llvm-svn: 358173
2019-04-11 14:26:32 +00:00
Sanjay Patel c0f4a35e68 [DAGCombiner][x86] scalarize inserted vector FP ops
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...

The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.

Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.

It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.

We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.

Differential Revision: https://reviews.llvm.org/D60514

llvm-svn: 358172
2019-04-11 14:21:57 +00:00
Diogo N. Sampaio 8ddfd46c61 [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Simon Pilgrim 6f3866c6fb [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support
llvm-svn: 358170
2019-04-11 14:15:01 +00:00
Simon Pilgrim 886e32e0f2 [X86][AVX] Add X86ISD::VPERMILPV demandedelts tests
llvm-svn: 358168
2019-04-11 14:09:35 +00:00
Simon Pilgrim cb5218ad48 [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support
llvm-svn: 358167
2019-04-11 14:04:19 +00:00
Simon Pilgrim 7021dec26e [X86][XOP] Add X86ISD::VPERMIL2 demandedelts test
llvm-svn: 358166
2019-04-11 13:52:43 +00:00
Simon Pilgrim e468cc7f14 [X86] SimplifyDemandedVectorElts - add VPPERM support
We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage.

llvm-svn: 358165
2019-04-11 13:30:38 +00:00
Shiva Chen 7cc03bd064 [RISCV] Put data smaller than eight bytes to small data section
Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset
could covert most of the small data section. Linker relaxation could transfer
the multiple data accessing instructions to a gp base with signed twelve-bit
offset instruction.

Differential Revision: https://reviews.llvm.org/D57493

llvm-svn: 358150
2019-04-11 04:59:13 +00:00
Amara Emerson 213e0bde04 [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson a7ff111b04 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson ae878dab03 [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper 10048060f6 [X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC
llvm-svn: 358141
2019-04-10 22:35:32 +00:00
Craig Topper a3ee7e2b3e [X86] Autogenerate complete checks. NFC
llvm-svn: 358140
2019-04-10 22:35:24 +00:00
Craig Topper 61f31cbcb2 [X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.

This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.

This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.

Differential Revision: https://reviews.llvm.org/D60532

llvm-svn: 358139
2019-04-10 21:42:08 +00:00
David Green deb3342018 [ARM] Add an extra test for constant hoist. NFC
llvm-svn: 358128
2019-04-10 19:18:58 +00:00
Craig Topper cacb70c94b [X86] Add test case for LEA formation regression seen with D60358. NFC
If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the.
Assuming C1 is small enough and C2 masks off all of the extend bits.

This pattern showed up in D60358. And we need to handle it to prevent a regression.

llvm-svn: 358124
2019-04-10 19:09:06 +00:00
David Green 4e3fd7757a [ARM] Add an extra constant hoisting test. NFC
This adds a simple extra test for constant hoisting to show it's
usefulness with constant addresses like those seen in memory
mapped registers in embedded systems.

llvm-svn: 358114
2019-04-10 18:05:57 +00:00
David Green 0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Matt Arsenault 7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Matt Arsenault 9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Simon Pilgrim 37d8d55823 [X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST
llvm-svn: 358096
2019-04-10 16:24:47 +00:00
Diogo N. Sampaio aae424a2d2 [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Diogo N. Sampaio 651463e4a8 [ARM] [FIX] Add missing f16 vector operations lowering
Summary:
Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.

These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.

Reviewers: olista01, pbarrio, LukeGeeson, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60319

llvm-svn: 358081
2019-04-10 13:28:06 +00:00
Diana Picus b6e83b98f9 [ARM GlobalISel] Select G_FCONSTANT for VFP3
Make it possible to TableGen code for FCONSTS and FCONSTD.

We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.

There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.

llvm-svn: 358063
2019-04-10 09:14:32 +00:00
Diana Picus 3533ad6801 [ARM GlobalISel] Select G_FCONSTANT into pools
Put all floating point constants into constant pools and load their
values from there.

llvm-svn: 358062
2019-04-10 09:14:24 +00:00
Diana Picus 165846b031 [ARM GlobalISel] Map G_FCONSTANT
llvm-svn: 358061
2019-04-10 09:14:16 +00:00
Jim Lin a49c95e02a [Sparc] Fix incorrect MI insertion position for spilling f128.
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.

before fixed:

std %f0, [%g1+80]
sethi 4, %g1        <<<
add %g1, %sp, %g1   <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

after fixed:

sethi 4, %g1
add %g1, %sp, %g1
std %f0, [%g1+80]
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60397

llvm-svn: 358042
2019-04-10 01:56:32 +00:00
Amara Emerson 9bf092d719 [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson 888dd5d198 [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson 92d74f19cf [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson 2b523f8162 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Craig Topper ba55a40fd0 [AArch64] Add test case to show missed opportunity to remove a shift before tbnz when the shift has been zero extended from i32 to i64. NFC
This pattern showed up in D60358 and it was suggested I had a test and fix that separately.

llvm-svn: 358030
2019-04-09 19:23:37 +00:00
Craig Topper 61e77b11d1 [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.

Differential Revision: https://reviews.llvm.org/D60020

llvm-svn: 358027
2019-04-09 18:33:56 +00:00
Simon Pilgrim d7cc0ec581 [TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support
llvm-svn: 358019
2019-04-09 16:52:21 +00:00
Stanislav Mekhanoshin 913ba8eeb4 Revert LIS handling in MachineDCE
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.

Differential Revision: https://reviews.llvm.org/D60466

llvm-svn: 358015
2019-04-09 16:13:53 +00:00
Simon Pilgrim 345eacd555 [TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.

llvm-svn: 357992
2019-04-09 10:27:59 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Chen Zheng 19ce6719bc [PowerPC] initialize SchedModel according to platform.
Differential Revision: https://reviews.llvm.org/D60177

llvm-svn: 357962
2019-04-09 01:25:25 +00:00
Simon Pilgrim 9f74df7d5b [TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling
Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts.

llvm-svn: 357942
2019-04-08 20:59:38 +00:00
Simon Pilgrim 86844a865e [X86][AVX] Add PR34380 shuffle test cases
llvm-svn: 357914
2019-04-08 14:05:42 +00:00
Sanjay Patel 50c3b290ed [x86] make 8-bit shl undesirable
I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops.

We've hinted at making all 8-bit ops undesirable for the reason in the code comment:

// TODO: Almost no 8-bit ops are desirable because they have no actual
//       size/speed advantages vs. 32-bit ops, but they do have a major
//       potential disadvantage by causing partial register stalls.

...but that leads to massive diffs and exposes all kinds of optimization holes itself.

Differential Revision: https://reviews.llvm.org/D60286

llvm-svn: 357912
2019-04-08 13:58:50 +00:00
Craig Topper afb6b42691 [X86] Split floating point tests out of atomic-mi.ll into atomic-fp.ll. Add avx and avx512f command lines. NFC
llvm-svn: 357882
2019-04-08 01:54:27 +00:00
Craig Topper 8aeefe3149 [X86] Add avx and avx512f command lines to atomic-non-integer.ll. NFC
llvm-svn: 357881
2019-04-08 01:54:24 +00:00
Craig Topper 424417da79 [X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile.
Summary:
Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings.

This is similar to what we do in the loadi32 predicate.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60341

llvm-svn: 357875
2019-04-07 19:19:44 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Simon Pilgrim 07adb6abda [X86][SSE] SimplifyDemandedBitsForTargetNode - Add initial PACKSS support
In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands.

This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved.

Differential Revision: https://reviews.llvm.org/D60333

llvm-svn: 357859
2019-04-07 10:40:01 +00:00
Craig Topper 399102b464 [X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates.
llvm-svn: 357848
2019-04-06 19:00:11 +00:00
Craig Topper f9b9f8d2e4 [X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding.
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.

llvm-svn: 357846
2019-04-06 18:00:50 +00:00
Craig Topper 82448bc09e [X86] Add test cases to show missed opportunities to use a sign extended 8 or 32 bit immediate AND when reversing SHL+AND to form an LEA.
When we shift the AND mask over we should shift in sign bits instead of zero bits. The scale in the LEA will shift these bits out so it doesn't matter whether we mask the bits off or not. Using sign bits will potentially allow a sign extended immediate to be used.

Also add some other test cases for cases that are currently optimal.

llvm-svn: 357845
2019-04-06 18:00:45 +00:00
Craig Topper 9d7379c250 [X86] Autogenerate complete checks. NFC
llvm-svn: 357844
2019-04-06 18:00:41 +00:00
Simon Pilgrim ec28615f7f [X86] Add AVX-target expandload and compressstore tests
llvm-svn: 357842
2019-04-06 14:40:52 +00:00
Simon Pilgrim d23611f9ad [X86] Split expandload and compressstore tests
llvm-svn: 357840
2019-04-06 14:14:54 +00:00
Simon Pilgrim 18a8a64c9f [X86][SSE] Add more exhaustive masked load/store tests
Reordered/renamed some existing tests to match the cleaned up order

llvm-svn: 357839
2019-04-06 14:01:37 +00:00
Francis Visoiu Mistrih 9d9d1b6b2b [X86] Enable tail calls for CallingConv::Swift
It's currently only enabled on AArch64 (enabled in r281376).

llvm-svn: 357809
2019-04-05 20:18:25 +00:00
Francis Visoiu Mistrih ab051a378c [X86] Preserve operand flag when expanding TCRETURNri
The expansion of TCRETURNri(64) would not keep operand flags like
undef/renamable/etc. which can result in machine verifier issues.

Also add plumbing to be able to use `-run-pass=x86-pseudo`.

llvm-svn: 357808
2019-04-05 20:18:21 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Craig Topper 80aa2290fb [X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon

Reviewed By: RKSimon

Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60228

llvm-svn: 357802
2019-04-05 19:28:09 +00:00
Craig Topper 7323c2bf85 [X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand.
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.

Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.

Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri

Reviewed By: andreadb

Subscribers: hiraditya, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60138

llvm-svn: 357801
2019-04-05 19:27:49 +00:00
Craig Topper e0bfeb5f24 [X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate.
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.

This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.

Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.

This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.

I plan to make similar changes for SETcc and Jcc.

Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60041

llvm-svn: 357800
2019-04-05 19:27:41 +00:00
Clement Courbet 1d8c9dfe03 [ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case.
llvm-svn: 357767
2019-04-05 15:03:25 +00:00
Simon Pilgrim 17586cda4a [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D60006

llvm-svn: 357765
2019-04-05 14:56:21 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Sanjay Patel 50a8652785 [DAGCombiner][x86] scalarize splatted vector FP ops
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Differential Revision: https://reviews.llvm.org/D60150

llvm-svn: 357760
2019-04-05 13:32:17 +00:00
Simon Pilgrim faa5b939f0 [X86][AVX] Add PR34584 masked store test cases
llvm-svn: 357757
2019-04-05 11:34:30 +00:00
Simon Pilgrim 329e63b915 [X86] Add SSE/AVX1/AVX2 masked trunc+store tests
llvm-svn: 357756
2019-04-05 11:22:28 +00:00
Roger Ferrer Ibanez e011e4f89c [RISCV] Implement adding a displacement to a BlockAddress
Recent change rL357393 uses MachineInstrBuilder::addDisp to add a based on a
BlockAddress but this case was not implemented.

This patch adds the missing case and a test for RISC-V that exercises the new
case.

Differential Revision: https://reviews.llvm.org/D60136

llvm-svn: 357752
2019-04-05 08:40:57 +00:00
Piotr Sobczak 0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Craig Topper 94f1772b1e [X86] Promote i16 SRA instructions to i32
We already promote SRL and SHL to i32.

This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions.

I think there might be some DAG combine improvement opportunities in the test changes here.

Differential Revision: https://reviews.llvm.org/D60278

llvm-svn: 357743
2019-04-05 06:32:50 +00:00
Serguei Katkov c39636cc2c [FastISel] Fix crash for gc.relocate lowring
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.

To resolve this issue we just ignore dead gc.relocate in the check.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184

llvm-svn: 357742
2019-04-05 05:41:08 +00:00
James Y Knight a040174418 Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs
It unnecessarily breaks previously-working code which used varargs,
but didn't pass any float/double arguments (such as EDK2).

Also revert the fixup on top of that:
Revert [X86] Fix a test from r357317

This reverts r357317 (git commit d413f41de6)
This reverts r357380 (git commit 7af32444b9)

llvm-svn: 357718
2019-04-04 19:05:48 +00:00
Sam Clegg 2a7cac932b [WebAssembly] Add new explicit relocation types for PIC relocations
See https://github.com/WebAssembly/tool-conventions/pull/106

Differential Revision: https://reviews.llvm.org/D59907

llvm-svn: 357710
2019-04-04 17:43:50 +00:00
Sanjay Patel 17648b848e [x86] eliminate unnecessary broadcast of horizontal op
This is another pattern that comes up if we more aggressively
scalarize FP ops.

llvm-svn: 357703
2019-04-04 14:46:13 +00:00
Jonas Paulsson c56ffed304 [SystemZ] Bugfix in isFusableLoadOpStorePattern()
This function is responsible for checking the legality of fusing an instance
of load -> op -> store into a single operation. In the SystemZ backend the
check was incomplete and a test case emerged with a cycle in the instruction
selection DAG as a result.

Instead of using the NodeIds to determine node relationships,
hasPredecessorHelper() now is used just like in the X86 backend. This handled
the failing tests and as well gave a few additional transformations on
benchmarks.

The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86
function, and it seems this could be made a utility function in common code
instead.

Review: Ulrich Weigand
https://reviews.llvm.org/D60255

llvm-svn: 357688
2019-04-04 12:12:35 +00:00
Diana Picus 153c3887e4 [ARM GlobalISel] Support DBG_VALUE
Make sure we can map and select DBG_VALUE.

llvm-svn: 357681
2019-04-04 10:24:51 +00:00
Craig Topper 3649c20884 [X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel.
SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are
zero. But that isn't the case here. We've done no analysis of the inputs.

llvm-svn: 357673
2019-04-04 05:00:18 +00:00
Serguei Katkov fb44846e37 [FastISel] Fix the crash in gc.result lowering
The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction.
This works as follows:
Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel
for these instructions if it is a call and continue fast instruction selections.

However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining
instructions in basic block.

However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint
causing breakage invariant the gc.results should be handled after statepoint.

Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext)
and as a result test does not check fast-isel at all.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60182

llvm-svn: 357672
2019-04-04 04:19:56 +00:00
David L. Jones 8b8a02175a Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)'
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.

llvm-svn: 357667
2019-04-04 02:27:57 +00:00
Craig Topper 051bd16faf [X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead.
These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers.

This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes
that take the extra zeroes as inputs. The zeros will then go through isel on their own to select
the MOV32r0 pseudo. Then we just need to mention the physical registers directly
in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary
copies to/from physical registers.

llvm-svn: 357659
2019-04-04 00:28:49 +00:00
Craig Topper 52cac4b79f [X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead.
This custom inserter existed so we could do a weird thing where we pretended that the instructions support
a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer
size agnostic in the isel pattern.

To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after
isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV
instruction.

With this change we now just do custom selection during isel instead and just assign the incoming address
of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take
care of selecting an LEA or other operation to do any address computation needed in this basic block.

I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit
use is properly sized based on the pointer. Without this we get comments in the assembly output about
killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX.

llvm-svn: 357652
2019-04-03 23:28:30 +00:00
Craig Topper 477008bd50 [X86] Remove dead CHECK lines for a test. NFC
llvm-svn: 357651
2019-04-03 23:28:18 +00:00
Craig Topper 437b45a1f8 [X86] Autogenerate checks. NFC
llvm-svn: 357650
2019-04-03 23:28:11 +00:00
Sanjay Patel c9a012e4ea [x86] fold shuffles of h-ops that have an undef operand
If an operand is undef, we can assume it's the same as the
other operand.

llvm-svn: 357644
2019-04-03 22:40:35 +00:00
Sanjay Patel 61b5e3c6a9 [x86] eliminate movddup of horizontal op
This pattern would show up as a regression if we more
aggressively convert vector FP ops to scalar ops.

There's still a missed optimization for the v4f64 legal
case (AVX) because we create that h-op with an undef operand.
We should probably just duplicate the operands for that
pattern to avoid trouble.

llvm-svn: 357642
2019-04-03 22:15:29 +00:00
Sanjay Patel 0b874c7c60 [x86] add another test for disguised h-op; NFC
llvm-svn: 357636
2019-04-03 21:10:55 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Sanjay Patel 8c9ceecdc6 [x86] add test for disguised horizontal op; NFC
llvm-svn: 357630
2019-04-03 20:34:22 +00:00
Krzysztof Parzyszek 4841643a1d [X86] Extend boolean arguments to inline-asm according to getBooleanType
Differential Revision: https://reviews.llvm.org/D60208

llvm-svn: 357615
2019-04-03 17:43:14 +00:00
Simon Pilgrim 15919ad306 [X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1
Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result.

llvm-svn: 357611
2019-04-03 17:28:34 +00:00
Simon Pilgrim 9e28dddf55 [X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1
Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets.

llvm-svn: 357608
2019-04-03 17:17:13 +00:00
Jessica Paquette e794121cd0 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Sanjay Patel 8055034666 [x86] make stack folding tests immune to unrelated transforms; NFC
llvm-svn: 357604
2019-04-03 16:33:24 +00:00
Ulrich Weigand 35dfd1b7df [SystemZ] Improve codegen for certain SADDO-immediate cases
When performing an add-with-overflow with an immediate in the
range -2G ... -4G, code currently loads the immediate into a
register, which generally takes two instructions.

In this particular case, it is preferable to load the negated
immediate into a register instead, which always only requires
one instruction, and then perform a subtract.

llvm-svn: 357597
2019-04-03 15:09:19 +00:00
Sanjay Patel 281cf28329 [x86] remove duplicate tests
Accidentally double-committed these.

llvm-svn: 357593
2019-04-03 14:45:45 +00:00
Sanjay Patel 393458f3ed [x86] add negative tests for FP scalarization; NFC
These go with the proposal in D60150.

llvm-svn: 357592
2019-04-03 14:41:28 +00:00
Sanjay Patel 04848090cd [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357591
2019-04-03 14:41:24 +00:00
Sanjay Patel eb5ffc7842 [x86] add tests with constants for FP scalarization; NFC
llvm-svn: 357587
2019-04-03 14:36:47 +00:00
Petar Avramovic afa3afa384 [MIPS GlobalISel] Select floating point arithmetic operations
Select 32 and 64 bit floating point add, sub, mul and div for MIPS32.

Differential Revision: https://reviews.llvm.org/D60191

llvm-svn: 357584
2019-04-03 14:12:59 +00:00
Sanjay Patel 00dae6b22d [DAGCombiner] loosen restrictions for moving shuffles after vector binop
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580
2019-04-03 13:42:06 +00:00