Commit Graph

9942 Commits

Author SHA1 Message Date
Nirav Dave d0418341fd [SelectionDAGBuilder] Defer C_Register Assignments to be in line with
those of C_RegisterClass. NFCI.

llvm-svn: 351854
2019-01-22 18:57:49 +00:00
Matt Arsenault a5840c3c39 Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Sanjay Patel effee52c59 [DAGCombiner] narrow vector binop with 2 insert subvector operands
vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z

This is another step in generic vector narrowing. It's also a step towards more horizontal op 
formation specifically for x86 (although we still failed to match those in the affected tests).

The scalarization cases are also not optimal (we should be scalarizing those), but it's still 
an improvement to use a narrower vector op when we know part of the result must be constant 
because both inputs are undef in some vector lanes.

I think a similar match but checking for a constant operand might help some of the cases in 
D51553.

Differential Revision: https://reviews.llvm.org/D56875

llvm-svn: 351825
2019-01-22 14:24:13 +00:00
Sanjay Patel e713c47d49 [DAGCombiner] fix crash when converting build vector to shuffle
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern 
there too, so we could avoid the back-and-forth using a shuffle.

llvm-svn: 351753
2019-01-21 17:30:14 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Bjorn Pettersson d4023bd2cb [SelectionDAG] Updates for -dag-dump-verbose
Summary:
This patch makes some changes related to -dag-dump-verbose.
Main use case has been when debugging how SelectionDAG is
dealing with debug info (SDDbgValue nodes).

1) We now print the number of DbgValues that are mapped to each
   SDNode.
2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
   printed also when not using -dag-dump-verbose).
3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
   new SDDbgValue::dump that will start a new line after calling
   print.
4) SDDbgValue::print now prints "Order", and it also prints
   some additional information when kind is CONST/FRAMEIX/VREG.
5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
   the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
   Invalidated nodes are not printed.
6) Prohibit inline printing of SDNode operands that has SDDbgValue
   nodes associated to them.

Reviewers: jmorse, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56793

llvm-svn: 351581
2019-01-18 20:06:13 +00:00
Florian Hahn dc4e154720 [SelectionDAG] Split very large token factors for chained stores to 64k chunks.
Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.

No test case, as it would be very IR file.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D56740

llvm-svn: 351571
2019-01-18 18:37:38 +00:00
Nirav Dave 0a45bf0e55 [SelectionDAGBuilder] Cleanup InlineAsm Output generation. NFCI.
Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.

llvm-svn: 351558
2019-01-18 15:57:13 +00:00
Florian Hahn d2c733b429 [SelectionDAG] Add getTokenFactor, which splits nodes with > 64k operands.
This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.

Differential Revision: https://reviews.llvm.org/D56739

llvm-svn: 351552
2019-01-18 14:05:59 +00:00
Florian Hahn 1b81772328 [SelectionDAG] Add static getMaxNumOperands function to SDNode.
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.

Also makes it easier to change in the future.

Reviewers: RKSimon, craig.topper, efriedma, aemerson

Reviewed By: RKSimon

Subscribers: hiraditya, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D56859

llvm-svn: 351537
2019-01-18 10:00:38 +00:00
Shiva Chen e84c729aca [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.

llvm-svn: 351527
2019-01-18 08:36:06 +00:00
Matt Arsenault 0cb08e448a Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Jeremy Morse 7dcea5ae3b [DebugInfo] Allow creation of DBG_VALUEs in blocks where the operand is not used
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D56678

llvm-svn: 351358
2019-01-16 17:25:27 +00:00
Florian Hahn e94470f1cc [SelectionDAG] Update check in createOperands to reflect max() is a valid value.
The value returned by max() is the last valid value, adjust the
comparison accordingly.

The code added in D55073 creates TokenFactors with max() operands.

Reviewers: aemerson, efriedma, RKSimon, craig.topper

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D56738

llvm-svn: 351318
2019-01-16 10:06:04 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Nirav Dave fcb4a492db [SelectionDAG] Check membership of register in class for single
register constraints. NFCI.

Now that X86's ST(7) constraints are fixed this check can be
reinstated.

llvm-svn: 351207
2019-01-15 17:09:23 +00:00
Sanjay Patel fad5bdaf95 [DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are 
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, 
we're avoiding the vector-scalar-vector moves in favor of vector ops.

We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.

Differential Revision: https://reviews.llvm.org/D56281

llvm-svn: 351198
2019-01-15 16:11:05 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Nirav Dave 3badfe74a2 Reland "Refactor GetRegistersForValue. NFCI."
Remove over-strictification class membership check.

llvm-svn: 351074
2019-01-14 17:09:45 +00:00
Simon Pilgrim a1bd4a6ba4 [DAGCombiner] Add (sub_sat x, x) -> 0 combine
llvm-svn: 351073
2019-01-14 15:43:34 +00:00
Simon Pilgrim fa1f518748 [DAGCombiner] Enable sub saturation constant folding
llvm-svn: 351072
2019-01-14 15:28:53 +00:00
Simon Pilgrim 7fc6882374 [DAGCombiner] Add add/sub saturation undef handling
Match ConstantFolding.cpp:
(add_sat x, undef) -> -1
(sub_sat x, undef) -> 0

llvm-svn: 351070
2019-01-14 14:16:24 +00:00
Simon Pilgrim cfa5f06dde [DAGCombiner] Enable add saturation constant folding
llvm-svn: 351060
2019-01-14 12:34:31 +00:00
Simon Pilgrim 67610926fc [DAGCombiner] Add add saturation constant folding tests.
Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now.

llvm-svn: 351057
2019-01-14 12:12:42 +00:00
Simon Pilgrim 3d42815cd8 [SelectionDAG] Add type sanity assertions for add/sub saturation node creation.
llvm-svn: 351055
2019-01-14 11:56:59 +00:00
Simon Pilgrim 56ba1db933 [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)
NOTE: We need more powerful signed overflow detection in computeOverflowKind
llvm-svn: 351026
2019-01-13 22:08:26 +00:00
Simon Pilgrim 888fa8680c Fix unused variable warning. NFCI.
llvm-svn: 351025
2019-01-13 21:53:12 +00:00
Simon Pilgrim 897d4c6fe9 [DAGCombiner] Some very basic add/sub saturation combines.
Handle combines with zero and constant canonicalization for adds.

llvm-svn: 351024
2019-01-13 21:50:24 +00:00
Craig Topper 4978de36e4 [LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with an assert.
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.

It calls 'getNOT' on a node that should be a CondCode.

I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?

llvm-svn: 351022
2019-01-13 19:33:30 +00:00
Nikita Popov 0400e50445 [X86] Rename overly verbose method; NFC
As suggested on D56636.

llvm-svn: 351021
2019-01-13 16:41:26 +00:00
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
Simon Pilgrim 0d92c4debc Use getShiftAmountTy for shift amounts.
llvm-svn: 351005
2019-01-12 12:00:43 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Pirama Arumuga Nainar cc07dabdaa [Legalizer] Use correct ValueType of SELECT_CC node during Float promotion
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.

Fix PR40273.

Reviewers: efriedma, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56566

llvm-svn: 350951
2019-01-11 18:46:02 +00:00
Martin Storsjo 114ad37c1d Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI."
This reverts commit r350841, as it actually had functional changes
and broke compilation. See PR40290.

llvm-svn: 350921
2019-01-11 07:31:17 +00:00
Sanjay Patel 9b368f39a9 [DAGCombiner] simplify code; NFC
llvm-svn: 350844
2019-01-10 16:47:42 +00:00
Nirav Dave cd18977add [SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI.
llvm-svn: 350841
2019-01-10 16:25:47 +00:00
Nirav Dave 4817c0e46c [SelectionDAGBuilder] Fix formatting. NFC.
llvm-svn: 350839
2019-01-10 16:22:19 +00:00
Nirav Dave 57f2c14860 [SelectionDAGBuilder] Refactor visitInlineAsm. NFC.
llvm-svn: 350837
2019-01-10 16:18:18 +00:00
James Y Knight 62df5eed16 [opaque pointer types] Remove some calls to generic Type subtype accessors.
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).

I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.

llvm-svn: 350835
2019-01-10 16:07:20 +00:00
Stanislav Mekhanoshin ed0d6c60af Remove check for single use in ShrinkDemandedConstant
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.

After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.

Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D56406

llvm-svn: 350684
2019-01-09 02:24:22 +00:00
Craig Topper 826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Craig Topper 57fc891c1b [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that should be handled.
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.

llvm-svn: 350490
2019-01-06 07:06:35 +00:00
Stanislav Mekhanoshin 35a3a3bd11 Added single use check to ShrinkDemandedConstant
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.

Differential Revision: https://reviews.llvm.org/D56289

llvm-svn: 350475
2019-01-05 19:20:00 +00:00
Craig Topper cfeb1cf9af [X86] Add INSERT_SUBVECTOR to ComputeNumSignBits
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.

My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.

Differential Revision: https://reviews.llvm.org/D56283

llvm-svn: 350432
2019-01-04 20:50:59 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Craig Topper 8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper 3109f3a4ab [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.

By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.

This fixes the regression on X86 in D56156.

Differential Revision: https://reviews.llvm.org/D56176

llvm-svn: 350236
2019-01-02 17:58:30 +00:00
Craig Topper c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Ayonam Ray e00606a1b2 Reversing the commit in revision 350186. Revision causes regression in 4
tests.

llvm-svn: 350187
2019-01-01 07:28:55 +00:00
Ayonam Ray c471bb2e67 Omit range checks from jump tables when lowering switches with unreachable
default

During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.

Review Reference: D52002

llvm-svn: 350186
2019-01-01 06:37:50 +00:00
Craig Topper ed3ffae4a4 [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.
Differential Revision: https://reviews.llvm.org/D56168

llvm-svn: 350179
2018-12-31 19:09:30 +00:00
Craig Topper 802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Kang Zhang 4aa6453767 [PowerPC] Fix ADDE, SUBE do not know how to promote operator
Summary:
This patch is created to fix the Bugzilla bug 39815:
https://bugs.llvm.org/show_bug.cgi?id=39815 

This patch is to support promotion integer result for the instruction ADDE, SUBE.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56119

llvm-svn: 350161
2018-12-30 07:48:09 +00:00
Richard Trieu a87b70d1db Add vtable anchor to classes.
llvm-svn: 350142
2018-12-29 02:02:13 +00:00
Justin Lebar 49fac56ea3 [NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.

An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.

Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.

Patch by Denys Zariaiev!

Differential Revision: https://reviews.llvm.org/D34708

llvm-svn: 350069
2018-12-26 19:12:31 +00:00
Craig Topper 0229da8f07 [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.

GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.

I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.

Based on a patch that Simon Pilgrim sent me in email.

Fixes PR40142.

llvm-svn: 350059
2018-12-24 19:40:20 +00:00
George Burgess IV 610c76534f [SelectionDAGBuilder] Use ::precise LocationSizes; NFC
More migration so we can disable the implicit int -> LocationSize
conversion.

All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.

Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.

llvm-svn: 350042
2018-12-24 05:34:21 +00:00
Sanjay Patel 93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel 9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Sanjay Patel 4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel 47a6129e26 [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
llvm-svn: 349958
2018-12-21 21:26:30 +00:00
Simon Pilgrim 911dce2f30 [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349907
2018-12-21 14:56:18 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Simon Pilgrim b208255fe0 [SelectionDAGBuilder] Enable funnel shift building to custom rotates
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.

AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.

Differential Revision: https://reviews.llvm.org/D55747

llvm-svn: 349765
2018-12-20 14:56:44 +00:00
Craig Topper bd788ce5db [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
llvm-svn: 349726
2018-12-20 05:28:06 +00:00
Simon Pilgrim 2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim 47ff0431e9 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 1 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349628
2018-12-19 14:09:09 +00:00
Simon Pilgrim 6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Simon Pilgrim 2072b5afbe [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.

This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.

I've updated SelectionDAG::simplifyShift to demonstrate its use.

Differential Revision: https://reviews.llvm.org/D55819

llvm-svn: 349616
2018-12-19 10:41:06 +00:00
Pete Cooper f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Nikita Popov a7d2a235bb [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests
Integer result promotion needs to use the scalar size, and we need
support for result widening.

This is in preparation for D55787.

llvm-svn: 349480
2018-12-18 13:22:53 +00:00
Simon Pilgrim af6fbbf18b [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.

llvm-svn: 349466
2018-12-18 09:33:25 +00:00
Krzysztof Parzyszek 5852aa44ae [SDAG] Clarify the origin of chain in REG_SEQUENCE in comment, NFC
llvm-svn: 349391
2018-12-17 20:30:20 +00:00
Craig Topper 15b7246935 [SelectionDAG] Fix noop detection for vectors in AssertZext/AssertSext in getNode
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.

Similarly for the assert above it.

I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868

llvm-svn: 349390
2018-12-17 20:29:13 +00:00
JF Bastien 1811217e4d NFC: remove unused variable
D55768 removed its use.

llvm-svn: 349377
2018-12-17 19:03:24 +00:00
Simon Pilgrim 9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Tim Northover 256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Sanjay Patel f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim 0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Simon Pilgrim 1e1fd9c761 [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts
Differential Revision: https://reviews.llvm.org/D55600

llvm-svn: 349264
2018-12-15 11:36:36 +00:00
Krzysztof Parzyszek 6b01d35497 [SDAG] Ignore chain operand in REG_SEQUENCE when emitting instructions
llvm-svn: 349186
2018-12-14 20:14:12 +00:00
Craig Topper 257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Sanjay Patel 093ab45d4c [DAGCombiner] clean up visitEXTRACT_VECTOR_ELT
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.

There's also a large block at the end that is dedicated 
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see. 

It's still not clear to me when we make certain transforms 
because the legality and constant conditions are 
intertwined in a way that might be improved.

llvm-svn: 349095
2018-12-14 00:09:08 +00:00
Sanjay Patel 791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Sanjay Patel c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Sanjay Patel a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Simon Pilgrim 77fc551d1a [TargetLowering] Add ISD::ROTL/ROTR vector expansion
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.

Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.

Begun removing x86 vector rotate custom lowering to use the expansion.

llvm-svn: 349025
2018-12-13 11:20:48 +00:00
Clement Courbet 76f4ae1092 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 349016
2018-12-13 09:56:19 +00:00
Simon Pilgrim eb508f8ccb [SelectionDAG] Add a generic isSplatValue function
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.

It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.

A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).

I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.

Differential Revision: https://reviews.llvm.org/D55426

llvm-svn: 348953
2018-12-12 18:32:29 +00:00
Simon Pilgrim f6c898e12f [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).

Differential Revision: https://reviews.llvm.org/D55558

llvm-svn: 348926
2018-12-12 13:43:07 +00:00
Leonard Chan 118e53fd63 [Intrinsic] Signed Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D54719

llvm-svn: 348912
2018-12-12 06:29:14 +00:00
Clement Courbet 8b6434bbb9 Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores."
Breaks ARM/memcpy-inline.ll

llvm-svn: 348844
2018-12-11 13:38:43 +00:00
Clement Courbet 93b3445770 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 348843
2018-12-11 13:15:56 +00:00
Simon Pilgrim f6371f5f23 [TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits
Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction.

Part of PR39689.

llvm-svn: 348839
2018-12-11 11:08:40 +00:00
Simon Pilgrim fc2c9af99c [TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.

Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.

Differential Revision: https://reviews.llvm.org/D55511

llvm-svn: 348784
2018-12-10 18:29:46 +00:00
Simon Pilgrim c73a955370 [DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call.
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.

llvm-svn: 348781
2018-12-10 18:18:50 +00:00
Francis Visoiu Mistrih 753efe3584 [DAGCombiner] Use the result value type in visitCONCAT_VECTORS
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.

With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
  0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
    0x7ff19d000b50: f64 = Register %1
In function: d

Differential Revision: https://reviews.llvm.org/D55507

llvm-svn: 348759
2018-12-10 14:31:34 +00:00
Jeremy Morse a06b163d5c [DebugInfo] Don't drop dbg.value's of nullptr
Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG --
apparently just because of an oversight when recognising Values that are
constant (see PR39787). This patch adds ConstantPointerNull to the list of
constants that can be turned into DBG_VALUEs.

The matter of what bit-value a null pointer constant in LLVM has was raised
in this mailing list thread:

  http://lists.llvm.org/pipermail/llvm-dev/2018-December/128234.html

Where it transpires LLVM relies on (IR) null pointers being zero valued,
thus I've baked this assumption into the patch.

Differential Revision: https://reviews.llvm.org/D55227

llvm-svn: 348753
2018-12-10 12:04:08 +00:00
Jeremy Morse 045c67769d [DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.

The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
 * The corresponding SDNode is now invalid
 * This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
 * The SDNode has been invalidated and we should emit "DBG_VALUE undef"
 * The SDNode has been invalidated but the debug data was salvaged, don't
   emit anything for this SDDbgValue
 * This SDDbgValue has been emitted

This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.

Differential Revision: https://reviews.llvm.org/D55372

llvm-svn: 348751
2018-12-10 11:20:47 +00:00
Sanjay Patel e767bf4468 [DAGCombiner] re-enable truncation of binops
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.

The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)

Original commit message for r347917:

The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.

llvm-svn: 348706
2018-12-08 16:07:38 +00:00
Craig Topper b4c96f5a32 [SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in getNode. NFCI
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.

llvm-svn: 348670
2018-12-08 00:27:34 +00:00
Pete Cooper 782a490dfb Follow-up from r348441 to add the rest of the objc ARC intrinsics.
This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods.

llvm-svn: 348646
2018-12-07 21:28:47 +00:00
Sanjay Patel bc47ff86fe [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC
This duplicates several shared checks, but we need to split
this up to fix underlying bugs in smaller steps.

llvm-svn: 348627
2018-12-07 18:51:08 +00:00
Sanjay Patel 3af4ae9735 [DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively 
reverting r347917 and its follow-up r348195 while we
investigate the bug.

llvm-svn: 348604
2018-12-07 15:47:52 +00:00
Sanjay Patel bb796cd61c [DAGCombiner] remove explicit calls to AddToWorkList; NFCI
As noted in the post-commit thread for rL347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html
...we don't need to repeat these calls because the combiner does it automatically.

llvm-svn: 348597
2018-12-07 15:00:56 +00:00
Simon Pilgrim d498dee7a2 [SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTOR
Fixes an assertion:

llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() || NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed.

Committed on behalf of: @pendingchaos (Rhys Perry)

Differential Revision: https://reviews.llvm.org/D55223

llvm-svn: 348574
2018-12-07 09:18:44 +00:00
Sanjay Patel c6441c8547 [DAGCombiner] use root SDLoc for all nodes created by logic fold
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.

I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.

llvm-svn: 348552
2018-12-07 00:01:57 +00:00
Sanjay Patel 86cb679851 [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).

This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.

llvm-svn: 348550
2018-12-06 23:53:58 +00:00
Sanjay Patel 276cef343c [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC
This code can still misbehave.

llvm-svn: 348547
2018-12-06 23:39:28 +00:00
Sanjay Patel 70af85b0ac [DAGCombiner] don't group bswap with casts in logic hoisting fold
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions. 

Earlier commits in this series:
rL348501
rL348508
rL348518

llvm-svn: 348534
2018-12-06 22:10:44 +00:00
Sanjay Patel 03a3ef2a0c [DAGCombiner] reduce indent; NFC
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.

llvm-svn: 348523
2018-12-06 20:02:47 +00:00
Andrea Di Biagio 52a2bac583 [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
    --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.

Differential Revision: https://reviews.llvm.org/D55274

llvm-svn: 348522
2018-12-06 19:55:38 +00:00
Sanjay Patel bfc7ffa40f [DAGCombiner] don't hoist logic op if operands have other uses, part 2
The PPC test with 2 extra uses seems clearly better by avoiding this transform. 
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.

llvm-svn: 348518
2018-12-06 19:18:56 +00:00
Sanjay Patel c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel e9bf78fa23 [DAGCombiner] refactor function that hoists bitwise logic; NFCI
Added FIXME and TODO comments for lack of safety checks.
This function is a suspect in out-of-memory errors as discussed in
the follow-up thread to r347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

llvm-svn: 348501
2018-12-06 17:08:03 +00:00
Simon Pilgrim 105a366254 DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI.
llvm-svn: 348494
2018-12-06 15:39:25 +00:00
Pete Cooper e13d0992dc Add objc.* ARC intrinsics and codegen them to their runtime methods.
Reviewers: erik.pilkington, ahatanak

Differential Revision: https://reviews.llvm.org/D55233

llvm-svn: 348441
2018-12-06 00:52:54 +00:00
Sanjay Patel 33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Simon Pilgrim 8fdaf5c915 [TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts
These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases.

llvm-svn: 348361
2018-12-05 12:20:05 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Simon Pilgrim cd8a152b18 Remove superfluous comments. NFCI.
As requested in D54698.

llvm-svn: 348350
2018-12-05 10:45:44 +00:00
Simon Pilgrim d24730cdda [TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask
Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version.....

Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches.

llvm-svn: 348348
2018-12-05 10:37:45 +00:00
Amara Emerson 814a6794ba [SelectionDAG] Split very large token factors for loads into 64k chunks.
There's a 64k limit on the number of SDNode operands, and some very large
functions with 64k or more loads can cause crashes due to this limit being hit
when a TokenFactor with this many operands is created. To fix this, create
sub-tokenfactors if we've exceeded the limit.

No test case as it requires a very large function.

rdar://45196621

Differential Revision: https://reviews.llvm.org/D55073

llvm-svn: 348324
2018-12-05 00:41:30 +00:00
Nirav Dave ce26c27b2a [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.
llvm-svn: 348288
2018-12-04 17:59:43 +00:00
Simon Pilgrim 0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim 666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Sanjay Patel b205606d3e [SelectionDAG] fold constant with undef vector per element
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef. 

But the most practical improvement is likely as shown in the tests here - 
for FP, we were overconstraining undef lanes to NaN, and that can prevent 
vector simplifications/narrowing (see D51553).

llvm-svn: 348090
2018-12-02 13:48:42 +00:00
Sanjay Patel 2daceedf92 [DAGCombiner] guard against an oversized shift crash
This change prevents the crash noted in the post-commit comments 
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html

We can't guarantee that an oversized shift amount is folded away, 
so we have to check for it.

Note that I committed an incomplete fix for that crash with:
rL347502

But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.

So I'm not sure how to expose the bug now (and apparently no fuzzers have found 
a way yet either).

On the plus side, we have discovered that we're missing real optimizations by 
not simplifying nodes sooner, so the earlier fix still has value, and there's 
likely more value in extending that so we can simplify more opcodes and simplify 
when doing RAUW and/or putting nodes on the combiner worklist.

Differential Revision: https://reviews.llvm.org/D54954

llvm-svn: 348089
2018-12-02 13:33:56 +00:00
Simon Pilgrim e017ed3245 [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

llvm-svn: 348073
2018-12-01 12:08:55 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Sanjay Patel 1901a12e76 [SelectionDAG] fold FP binops with 2 undef operands to undef
llvm-svn: 348016
2018-11-30 18:38:52 +00:00
Than McIntosh 0e0a8a3fee [CodeGen] Prefer static frame index for STATEPOINT liveness args
Summary:
If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.

Patch by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, niravd, reames

Reviewed By: reames

Subscribers: cherryyz, reames, anna, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53889

llvm-svn: 347998
2018-11-30 16:22:41 +00:00
Nicolai Haehnle 445b0b6260 TableGen/ISel: Allow PatFrag predicate code to access captured operands
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.

For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.

With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.

Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.

Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c

Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D51994

llvm-svn: 347992
2018-11-30 14:15:13 +00:00
Alex Bradbury fca95cfee9 [SelectionDAG] Support result type promotion for FLT_ROUNDS_
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_.

Differential Revision: https://reviews.llvm.org/D53820

llvm-svn: 347986
2018-11-30 13:18:33 +00:00
Alex Bradbury bd24c7b045 [SelectionDAG] Support promotion of PREFETCH operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operands of ISD::PREFETCH.

Differential Revision: https://reviews.llvm.org/D53281

llvm-svn: 347980
2018-11-30 10:06:31 +00:00
Alex Bradbury 36e0fd1d39 [SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands
For targets where i32 is not a legal type (e.g. 64-bit RISC-V), 
LegalizeIntegerTypes must promote the operand.

Differential Revision: https://reviews.llvm.org/D53279

llvm-svn: 347978
2018-11-30 10:02:06 +00:00
Alex Bradbury e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Sanjay Patel 8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper b955bf382c [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.

This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.

Differential Revision: https://reviews.llvm.org/D54906

llvm-svn: 347593
2018-11-26 21:12:39 +00:00
Craig Topper 923f463ef2 [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or
We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C)

Differential Revision: https://reviews.llvm.org/D54818

llvm-svn: 347591
2018-11-26 20:16:33 +00:00
Sanjay Patel 7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Sanjay Patel 04435677d0 [SelectionDAG] move constant or splat functions to common location
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.

llvm-svn: 347523
2018-11-25 16:09:32 +00:00
Sanjay Patel 7e119c0400 [DAG] consolidate shift simplifications
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.

The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.

llvm-svn: 347502
2018-11-23 20:05:12 +00:00
Craig Topper 0ec17884de [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type.
This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together.

But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code.

The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.

llvm-svn: 347482
2018-11-23 02:32:13 +00:00
Craig Topper b239763384 [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type.
SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that.

The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.

llvm-svn: 347479
2018-11-22 22:56:52 +00:00
Sanjay Patel 3e80019275 [DAGCombiner] form 'not' ops ahead of shifts (PR39657)
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was 
reversing that transform. I think we need this transform in the backend 
regardless of what happens in IR to catch cases where the shift-xor 
is formed late from GEP or other ops.

https://rise4fun.com/Alive/NC1

  Name: shl
  Pre: (-1 << C2) == C1
  %shl = shl i8 %x, C2
  %r = xor i8 %shl, C1
  =>
  %not = xor i8 %x, -1
  %r = shl i8 %not, C2
  
  Name: shr
  Pre: (-1 u>> C2) == C1
  %sh = lshr i8 %x, C2
  %r = xor i8 %sh, C1
  =>
  %not = xor i8 %x, -1
  %r = lshr i8 %not, C2

https://bugs.llvm.org/show_bug.cgi?id=39657

llvm-svn: 347478
2018-11-22 19:24:10 +00:00
Sanjay Patel 20935e0ab5 [DAGCombiner] refactor select-of-FP-constants transform
This transform needs to be limited. 

We are converting to a constant pool load very early, and we 
are turning loads that are independent of the select condition 
(and therefore speculatable) into a dependent non-speculatable 
load.

We may also be transferring a condition code from an FP register
to integer to create that dependent load.

llvm-svn: 347424
2018-11-21 20:54:47 +00:00
Sanjay Patel 1c74747478 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 347410
2018-11-21 20:00:32 +00:00
Simon Pilgrim 5448339889 [TargetLowering] SimplifyDemandedBits - only reduce known bits for integer constants
Avoids fuzzing crash found by Mikael Holmén.

llvm-svn: 347393
2018-11-21 14:26:19 +00:00
Nikita Popov c5b152bdef Test commit: Delete trailing space in comment
llvm-svn: 347385
2018-11-21 10:57:22 +00:00
Sanjay Patel 357053f289 [DAGCombiner] look through bitcasts when trying to narrow vector binops
This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).

The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.

The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.

Differential Revision: https://reviews.llvm.org/D54392

llvm-svn: 347356
2018-11-20 22:26:35 +00:00
Simon Pilgrim 3735105961 [DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989)
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.

llvm-svn: 347313
2018-11-20 15:23:50 +00:00
Simon Pilgrim b356d0463e [TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support
For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.

I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.

Differential Revision: https://reviews.llvm.org/D54679

llvm-svn: 347301
2018-11-20 12:02:16 +00:00
Craig Topper 4954c66430 [SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.

This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362

Reviewers: spatel, efriedma, RKSimon, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D54725

llvm-svn: 347287
2018-11-20 04:30:26 +00:00
Sanjay Patel a36c444471 [DAGCombiner] reduce code duplication in visitXOR; NFC
llvm-svn: 347278
2018-11-20 00:51:45 +00:00
Stanislav Mekhanoshin 54ebfe8aee Implement computeKnownBits for scalar_to_vector
Differential Revision: https://reviews.llvm.org/D54728

llvm-svn: 347274
2018-11-19 23:34:07 +00:00
Simon Pilgrim 740122fb8c [DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207)
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.

Differential Revision: https://reviews.llvm.org/D53478

llvm-svn: 347255
2018-11-19 19:37:59 +00:00
Simon Pilgrim de3605f56b [TargetLowering] expandFP_TO_UINT - improve fp16 support
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.

I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.

Differential Revision: https://reviews.llvm.org/D54703

llvm-svn: 347251
2018-11-19 19:16:13 +00:00
Sanjay Patel b25adf5edb [SelectionDAG] simplify vector select with undef operand(s)
llvm-svn: 347227
2018-11-19 17:06:05 +00:00
Sanjay Patel a1dca3553e [SelectionDAG] simplify select FP with undef condition
llvm-svn: 347212
2018-11-19 14:42:28 +00:00
Sanjay Patel c036d844be [SelectionDAG] add simplifySelect() to reduce code duplication; NFC
This should be extended to handle FP and vectors in follow-up patches.

llvm-svn: 347210
2018-11-19 14:35:22 +00:00
Sanjay Patel 8c0cd77bff [DAG] add undef simplifications for select nodes
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170
2018-11-18 17:36:23 +00:00
Sanjay Patel 42c22a1f87 [SelectionDAG] simplify code; NFC
llvm-svn: 347160
2018-11-18 14:39:03 +00:00
Fangrui Song 7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Stanislav Mekhanoshin 0ff7c8309d DAG combiner: fold (select, C, X, undef) -> X
Differential Revision: https://reviews.llvm.org/D54646

llvm-svn: 347110
2018-11-16 23:13:38 +00:00
Craig Topper 9e97054211 [LegalizeVectorOps] After custom legalizing an extending load or a truncating store, make sure the custom code is also legal.
For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization.

Unfortunately, this doesn't seem to matter for the output of any existing lit tests.

llvm-svn: 347094
2018-11-16 21:04:58 +00:00
Simon Pilgrim 3c8baf4f90 [TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they match. NFCI.
Use the same variable names etc.

llvm-svn: 347045
2018-11-16 12:26:26 +00:00
Sam Parker ab99cfab21 [DAGCombine] Fix non-deterministic debug output
PR37970 reported non-deterministic debug output, this was caused by
iterating through a set and not a a vector.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970

Differential Revision: https://reviews.llvm.org/D54570

llvm-svn: 347037
2018-11-16 08:35:19 +00:00
Craig Topper bac7d9735a [LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width.
If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step.

llvm-svn: 347034
2018-11-16 07:13:34 +00:00
Stanislav Mekhanoshin 35de877e8c Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling
Legalizer used to request an ext load from i8 to i1 when promoting
vector element type to i8. Fixed.

Differential Revision: https://reviews.llvm.org/D54440

llvm-svn: 346795
2018-11-13 20:26:27 +00:00
Craig Topper aca8390216 [SelectionDAG][X86] Relax restriction on the width of an input to *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.

This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.

X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.

I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.

The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.

Differential Revision: https://reviews.llvm.org/D54346

llvm-svn: 346784
2018-11-13 19:45:21 +00:00
Cameron McInally cbde0d9c7b [IR] Add a dedicated FNeg IR Instruction
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.

Differential Revision: https://reviews.llvm.org/D53877

llvm-svn: 346774
2018-11-13 18:15:47 +00:00
Craig Topper 0b33b468a1 [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops
It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner.

Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all.

Differential Revision: https://reviews.llvm.org/D54285

llvm-svn: 346728
2018-11-13 01:59:32 +00:00
Nirav Dave a395e2df56 [DAGCombiner] Fix load-store forwarding of indexed loads.
Summary:
Handle extra output from index loads in cases where we wish to
forward a load value directly from a preceeding store.

Fixes PR39571.

Reviewers: peter.smith, rengolin

Subscribers: javed.absar, hiraditya, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54265

llvm-svn: 346654
2018-11-12 14:05:40 +00:00
Craig Topper d23cdbbeb2 [DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC
Removes the need to call getNode internally and to recreate an SDValue after the call.

llvm-svn: 346600
2018-11-10 23:46:03 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Craig Topper f2e65f8636 [SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFC
gcc wants parentheses around the logical OR since there is a logical AND for the string.

llvm-svn: 346564
2018-11-09 23:11:30 +00:00
Craig Topper 9a7e19b8f2 [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars
It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input.

I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used.

Differential Revision: https://reviews.llvm.org/D54283

llvm-svn: 346530
2018-11-09 18:04:34 +00:00
Serge Guelton 86f8b70f1b Type safe version of MachinePassRegistry
Previous version used type erasure through a `void* (*)()` pointer,
which triggered gcc warning and implied a lot of reinterpret_cast.

This version should make it harder to hit ourselves in the foot.

Differential revision: https://reviews.llvm.org/D54203

llvm-svn: 346522
2018-11-09 17:19:45 +00:00
Alexandros Lamprineas e15c982f6d [SelectionDAG] swap select_cc operands to enable folding
The DAGCombiner tries to SimplifySelectCC as follows:

  select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4)

It can't cope with the situation of reordered operands:

  select_cc(x, y, 0, 16, cc)

In that case we just need to swap the operands and invert the Condition Code:

  select_cc(x, y, 16, 0, ~cc)

Differential Revision: https://reviews.llvm.org/D53236

llvm-svn: 346484
2018-11-09 11:09:40 +00:00
Craig Topper 8cca8bd4aa [SelectionDAG] Assert on the width of DemandedElts argument to computeKnownBits for all vector typed operations not just build_vector.
Fix AArch64 unit test that fails with the assertion added.

llvm-svn: 346437
2018-11-08 20:29:17 +00:00
Nirav Dave 6ce9f72f76 [DAGCombine] Improve alias analysis for chain of independent stores.
FindBetterNeighborChains simulateanously improves the chain
dependencies of a chain of related stores avoiding the generation of
extra token factors. For chains longer than the GatherAllAliasDepths,
stores further down in the chain will necessarily fail, a potentially
significant waste and preventing otherwise trivial parallelization.

This patch directly parallelize the chains of stores before improving
each store. This generally improves DAG-level parallelism.

Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk

Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53552

llvm-svn: 346432
2018-11-08 19:14:20 +00:00
James Y Knight 72f76bf230 Add support for llvm.is.constant intrinsic (PR4898)
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.

Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.

Updated from patch initially by Janusz Sobczak.

Differential Revision: https://reviews.llvm.org/D4276

llvm-svn: 346322
2018-11-07 15:24:12 +00:00
Cameron McInally 9757d5d6c1 [FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsics
Differential Revision: https://reviews.llvm.org/D53411

llvm-svn: 346141
2018-11-05 15:59:49 +00:00
Simon Pilgrim 6bd468bd8b [TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT support. NFCI.
Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types.

llvm-svn: 346139
2018-11-05 15:49:09 +00:00
Craig Topper 8f2f2a76b9 [DAGCombiner] Use tryFoldToZero to simplify some code and make it work correctly between LegalTypes and LegalOperations.
The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this.

llvm-svn: 346119
2018-11-05 05:53:06 +00:00
Craig Topper 8d64abddd1 [DAGCombiner] Remove an unused argument from tryFoldToZero. NFC
llvm-svn: 346118
2018-11-05 05:53:03 +00:00
Craig Topper 3292ea03d3 [DAGCombiner] Remove 'else' after return. NFC
This makes this code consistent with the nearly identical code in visitZERO_EXTEND.

llvm-svn: 346090
2018-11-04 06:56:32 +00:00
Craig Topper 1ba86188cf [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode.
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.

The rest of the patch is just changing all callers to use getNode directly.

llvm-svn: 346087
2018-11-04 02:10:18 +00:00
Craig Topper 60c202a494 [X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.

I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.

I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.

For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.

Differential Revision: https://reviews.llvm.org/D54024

llvm-svn: 346043
2018-11-02 21:09:49 +00:00
Simon Pilgrim cdcbeb4997 [DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732)
reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this.

While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing.

Differential Revision: https://reviews.llvm.org/D53712

llvm-svn: 345964
2018-11-02 11:06:18 +00:00
Mandeep Singh Grang 547a0d765a [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53996

llvm-svn: 345909
2018-11-01 23:22:25 +00:00
Craig Topper e2483020f2 [DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME.
I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover.

llvm-svn: 345908
2018-11-01 23:21:45 +00:00
Simon Pilgrim b34a052852 [LegalizeDAG] Add generic vector CTPOP expansion (PR32655)
This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion.

Differential Revision: https://reviews.llvm.org/D53258

llvm-svn: 345869
2018-11-01 18:22:11 +00:00
Mandeep Singh Grang b0cdf56dd7 Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"
This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2.

llvm-svn: 345863
2018-11-01 17:53:57 +00:00
Sanjay Patel c5fe3ce2ec [DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511)
The test causes a crash because we were trying to extract v4f32 to v3f32, and the
narrowing factor was then 4/3 = 1 producing a bogus narrow type.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=39511

llvm-svn: 345842
2018-11-01 15:41:12 +00:00
Mandeep Singh Grang 88ad9ac720 [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53673

llvm-svn: 345791
2018-10-31 23:16:20 +00:00
Stanislav Mekhanoshin 222e9c11f7 Check shouldReduceLoadWidth from SimplifySetCC
SimplifySetCC could shrink a load without checking for
profitability or legality of such shink with a target.

Added checks to prevent shrinking of aligned scalar loads
in AMDGPU below dword as scalar engine does not support it.

Differential Revision: https://reviews.llvm.org/D53846

llvm-svn: 345778
2018-10-31 21:24:30 +00:00
Scott Linder 92bb783cfe [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt
lowerRangeToAssertZExt currently relies on something like EarlyCSE having
eliminated the constant range [0,1). At -O0 this leads to an assert.

Differential Revision: https://reviews.llvm.org/D53888

llvm-svn: 345770
2018-10-31 19:57:36 +00:00
Craig Topper eeac12af6d [SelectionDAGISel] Suppress a -Wunused-but-set-variable warning in release builds. NFC
llvm-svn: 345761
2018-10-31 18:46:15 +00:00
Simon Pilgrim 077a9adb00 Fix comment typo. NFCI.
llvm-svn: 345758
2018-10-31 18:19:52 +00:00
Simon Pilgrim 805cdcfe73 [SelectionDAG] SelectionDAGLegalize::ExpandBITREVERSE - ensure we use ShiftTy
We should be using the getShiftAmountTy value type for shift amounts.

llvm-svn: 345756
2018-10-31 18:14:14 +00:00
David Bolvansky d0080c3a5f [DAGCombiner] Fold 0 div/rem X to 0
Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover

Reviewed By: RKSimon

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D52504

llvm-svn: 345721
2018-10-31 14:18:57 +00:00
Cameron McInally 2ad870e785 [FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM
Differential Revision: https://reviews.llvm.org/D53216

llvm-svn: 345650
2018-10-30 21:01:29 +00:00
Bjorn Pettersson fe09a20f09 [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad
Summary:
Normalize the offset for endianess before checking
if the store cover the load in ForwardStoreValueToDirectLoad.

Without this we missed out on some optimizations for big
endian targets. If for example having a 4 bytes store followed
by a 1 byte load, loading the least significant byte from the
store, the STCoversLD check would fail (see @test4 in
test/CodeGen/AArch64/load-store-forwarding.ll).

This patch also fixes a problem seen in an out-of-tree target.
The target has i40 as a legal type, it is big endian,
and the StoreSize for i40 is 48 bits. So when normalizing
the offset for endianess we need to take the StoreSize into
account (assuming that padding added when storing into
a larger StoreSize always is added at the most significant
end).

Reviewers: niravd

Reviewed By: niravd

Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho

Differential Revision: https://reviews.llvm.org/D53776

llvm-svn: 345636
2018-10-30 20:16:39 +00:00
Nirav Dave eedd2ccd02 [DAG] Add const variants for BaseIndexOffset functions.
llvm-svn: 345623
2018-10-30 18:26:43 +00:00
Sanjay Patel 8b207defea [DAGCombiner] narrow vector binops when extraction is cheap
Narrowing vector binops came up in the demanded bits discussion in D52912.

I don't think we're going to be able to do this transform in IR as a canonicalization 
because of the risk of creating unsupported widths for vector ops, but we already have 
a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is 
currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression 
test diffs).

This is artificially limited to not look through bitcasts because there are so many 
test diffs already, but that's marked with a TODO and is a small follow-up.

Differential Revision: https://reviews.llvm.org/D53784

llvm-svn: 345602
2018-10-30 14:14:34 +00:00
Sanjay Patel 680c9227ca [SelectionDAG] fix build warning for mismatched signs in compare; NFC
llvm-svn: 345598
2018-10-30 13:47:19 +00:00
Simon Pilgrim 858303b827 [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.

This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.	
	
Differential Revision: https://reviews.llvm.org/D53760

llvm-svn: 345578
2018-10-30 10:32:11 +00:00
David Bolvansky dfdbb038e8 [DAGCombiner] Improve X div/rem Y fold if single bit element type
Summary: Tests by @spatel, thanks

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: sdardis, atanasyan, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D52668

llvm-svn: 345575
2018-10-30 09:07:22 +00:00
Craig Topper b293322cee [LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened
Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86.

Reviewers: efriedma, RKSimon

Reviewed By: efriedma

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53229

llvm-svn: 345567
2018-10-30 03:27:15 +00:00
Leonard Chan 905abe5b5d [Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53783

llvm-svn: 345512
2018-10-29 16:54:37 +00:00
Craig Topper 7a18b4bc51 [SelectionDAG] Fix bad indentation. NFC
llvm-svn: 345481
2018-10-28 21:24:20 +00:00
Simon Pilgrim 3497d536f7 [TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to TargetLowering::expandUINT_TO_FP.
llvm-svn: 345478
2018-10-28 15:34:35 +00:00
Simon Pilgrim 9b77f0c291 [VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.
Add vector support to TargetLowering::expandFP_TO_UINT.

This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.

llvm-svn: 345473
2018-10-28 13:07:25 +00:00
Craig Topper c4b785ae1e [DAGCombiner] Better constant vector support for FCOPYSIGN.
Enable constant folding when both operands are vectors of constants.

Turn into FNEG/FABS when the RHS is a splat constant vector.

llvm-svn: 345469
2018-10-28 01:32:49 +00:00
Simon Pilgrim 3cf33fcdd6 [TargetLowering] Move LegalizeDAG FP_TO_UINT handling to TargetLowering::expandFP_TO_UINT. NFCI.
First step towards fixing PR17686 and adding vector support.

llvm-svn: 345452
2018-10-27 12:15:58 +00:00
Sanjay Patel 0eddd4730f [DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC
We can extend this code to handle many more cases 
if an extract is cheap, so prepping for that change.

llvm-svn: 345430
2018-10-26 21:32:04 +00:00
Craig Topper 7bf85f5c8d [LegalizeTypes] Stop DAGTypeLegalizer::getSETCCWidenedResultTy from creating illegal setccs. Add checks for valid setccs
The DAGTypeLegalizer::getSETCCWidenedResultTy was widening the MaskVT, but the code in convertMask called after getSETCCWidenedResultTy had no idea this widening had occurred. So none of the operands were widened when convertMask created new setccs with the widened VT.

This patch removes the widening and adds some asserts to getNode to validate the types of setccs to prevent issues like this in the future.

Differential Revision: https://reviews.llvm.org/D53743

llvm-svn: 345428
2018-10-26 20:59:55 +00:00
Eli Friedman 2ac1162917 [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1.
The "dead" markings allow existing target-independent optimizations,
like MachineSink, to trigger more frequently. The CPSR defs would have
eventually been marked dead by LiveVariables, so this only affects
optimizations before regalloc.

The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible
with this change: the transform adds a use to an otherwise dead def
of CPSR. This is covered by existing regression tests.

thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the
generated code; I'll fix it in D53452.

Differential Revision: https://reviews.llvm.org/D53453

llvm-svn: 345420
2018-10-26 19:32:24 +00:00
Heejin Ahn 24faf859e5 Reland "[WebAssembly] LSDA info generation"
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 345345
2018-10-25 23:55:10 +00:00
Cameron McInally 384a74b0e6 [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.

Differential Revision: https://reviews.llvm.org/D53650

llvm-svn: 345295
2018-10-25 18:09:33 +00:00
Simon Pilgrim f02c0f8af6 [LegalizeDAG] Remove dead SINT_TO_FP legalization code
As noticed on D52965, the SINT_TO_FP i64 to f32 legalization code has been dead for years - protected by an assert.

Differential Revision: https://reviews.llvm.org/D53703

llvm-svn: 345290
2018-10-25 17:43:36 +00:00
Simon Pilgrim 49d79a864c Missing semicolon.
llvm-svn: 345257
2018-10-25 11:38:17 +00:00
Simon Pilgrim 838eb24014 [TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226)
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.

Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.

The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.

Differential Revision: https://reviews.llvm.org/D53649

llvm-svn: 345256
2018-10-25 11:15:57 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Thomas Lively 43bc46207a [SelectionDAG] DAG combiner for fminnan and fmaxnan
Summary: Depends on D52765.

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52768

llvm-svn: 345210
2018-10-24 22:18:54 +00:00
Tim Northover 05fe8f918b [DAG] check more operands for cycles when merging stores.
Until now, we've only checked whether merging stores would cause a cycle via
the value argument, but the address and indexed offset arguments are also
capable of creating cycles in some situations.

The addresses are all base+offset with notionally the same base, but the base
SDNode may still be different (e.g. via an indexed load in one case, and an
ISD::ADD elsewhere). This allows cycles to creep in if one of these sources
depends on another.

The indexed offset is usually undef (representing a non-indexed store), but on
some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value,
again allowing dependency cycles to creep in.

llvm-svn: 345200
2018-10-24 21:36:34 +00:00
Simon Pilgrim 6f53b38fd4 [TargetLowering] Add SimplifyDemandedBitsForTargetNode callback
Add a SimplifyDemandedBitsForTargetNode callback to handle target nodes.

Differential Revision: https://reviews.llvm.org/D53643

llvm-svn: 345179
2018-10-24 19:00:56 +00:00
Simon Pilgrim c8c7451063 [LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f32 expansion.
Use SrcVT/DestVT types and correct shift type.

Part of prep work for D52965

llvm-svn: 345158
2018-10-24 16:35:01 +00:00
Matthias Braun 4f82406c46 SelectionDAG: Reuse bigger sized constants in memset expansion.
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...

We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.

- Ideally this would be handled by the ConstantHoist pass but it runs
  too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
  however SelectionDAG constantfolding would constantfold the
  "trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
  stop DAGCombiner from constant folding in this situation. (Similar to
  how ConstantHoisting marks things as Opaque to avoid folding
  ADD/SUB/etc.)

Differential Revision: https://reviews.llvm.org/D53181

llvm-svn: 345102
2018-10-23 23:19:23 +00:00
Simon Pilgrim 8c4796deb4 [LegalizeDAG] Share Vector/Scalar CTPOP Expansion
As suggested on D53258, this patch move the CTPOP expansion code from SelectionDAGLegalize to TargetLowering to allow it to be reused by the VectorLegalizer.

Proper vector support will be added by D53258.

llvm-svn: 345066
2018-10-23 18:28:24 +00:00
Simon Pilgrim d705ba97dd [LegalizeDAG] Share Vector/Scalar CTLZ Expansion
As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.

Extension to D53474

llvm-svn: 345060
2018-10-23 17:48:30 +00:00
Sanjay Patel ad12df829c [SelectionDAG] use 'match' to simplify code; NFC
Vector types are not possible here because this code only starts
matching from the scalar bool value of a conditional branch, but
this is another step towards completely removing the fake binop
queries for not/neg/fneg.

llvm-svn: 345041
2018-10-23 15:46:10 +00:00
Benjamin Kramer 1e212e8abb [LegalizeDAG] Remove unused variable
llvm-svn: 345040
2018-10-23 15:43:36 +00:00
Simon Pilgrim b975ff4700 [LegalizeDAG] Share Vector/Scalar CTTZ Expansion
As suggested on D53258, this patch demonstrates sharing common CTTZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.

I intend to move CTLZ and (scalar) CTPOP over as well and then update D53258 accordingly.

Differential Revision: https://reviews.llvm.org/D53474

llvm-svn: 345039
2018-10-23 15:37:19 +00:00
Leonard Chan 0acfc6be38 [Intrinsic] Unigned Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53340

llvm-svn: 344971
2018-10-22 23:08:40 +00:00
Craig Topper c8e183f9ee Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344965
2018-10-22 22:14:05 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Sanjay Patel e439cc2745 [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)
This is a late backend subset of the IR transform added with:
D52439

We can confirm that the conversion to a 'trunc' is correct by running:
$ opt -instcombine -data-layout="e"
(assuming the IR transforms are correct; change "e" to "E" for big-endian)

As discussed in PR39016:
https://bugs.llvm.org/show_bug.cgi?id=39016
...the pattern may emerge during legalization, so that's we are waiting for an 
insertelement to become a scalar_to_vector in the pattern matching here.

The DAG allows for fun variations that are not possible in IR. Result types for 
extracts and scalar_to_vector don't necessarily match input types, so that means 
we have to be a bit more careful in the transform (see code comments).

The tests show that we don't handle cases that require a shift (as we did in the
IR version). I've left that as a potential follow-up because I'm not sure if 
that's a real concern at this late stage.

Differential Revision: https://reviews.llvm.org/D53201

llvm-svn: 344872
2018-10-21 20:13:29 +00:00
Krasimir Georgiev 547d824da6 Revert "[WebAssembly] LSDA info generation"
This reverts commit r344575.
Newly introduced test eh-lsda.ll.test fails with use-after-free under
ASAN build.

llvm-svn: 344639
2018-10-16 18:50:09 +00:00
Leonard Chan 699b3b54da [Intrinsic] Signed Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform saturation addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53053

llvm-svn: 344629
2018-10-16 17:35:41 +00:00
Simon Pilgrim ac58636ec5 [LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f64 expansion.
Use SrcVT/DestVT types, correct shift type and AND instead of ZERO_EXTEND_IN_REG.

Part of prep work for D52965

llvm-svn: 344602
2018-10-16 10:06:15 +00:00
Heejin Ahn 0981eaab47 [WebAssembly] LSDA info generation
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 344575
2018-10-16 00:09:12 +00:00
Sanjay Patel 4cf1da0e02 [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts
This is intended to make the backend on par with functionality that was 
added to the IR version of SimplifyDemandedVectorElts in:
rL343727
...and the original motivation is that we need to improve demanded-vector-elements 
in several ways to avoid problems that would be exposed in D51553.

Differential Revision: https://reviews.llvm.org/D52912

llvm-svn: 344541
2018-10-15 18:05:34 +00:00
Sanjay Patel 8bd74785f0 [DAGCombiner] allow undef elts in vector fmul matching
llvm-svn: 344534
2018-10-15 16:54:07 +00:00
Sanjay Patel 89e2197c33 [DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI
The transform doesn't work if the vector constant has undef elements.

llvm-svn: 344532
2018-10-15 16:47:01 +00:00
Sanjay Patel 9e7e0fd828 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344528
2018-10-15 15:56:39 +00:00
Sanjay Patel 4e970ff022 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344525
2018-10-15 15:38:38 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Simon Pilgrim a0590a4f7a [LegalizeDAG] Don't bother with final MUL+SRL stage for byte CTPOP.
The final stage of CTPOP expansion (v = (v * 0x01010101...) >> (Len - 8)) is completely pointless for the byte (Len = 8) case as it reduces to (v = (v * 0x01...) >> 0), but annoyingly this doesn't always get optimized away. 

Found while investigating generic vector CTPOP expansion (PR32655).

llvm-svn: 344477
2018-10-14 15:56:28 +00:00
Simon Pilgrim 28a143f738 Pull out repeated variables from SelectionDAGLegalize::ExpandBitCount.
The CTPOP case has been changed from VT.getSizeInBits to VT.getScalarSizeInBits - but this fits in with future work for vector support (PR32655) and doesn't affect any current (scalar) uses.

llvm-svn: 344461
2018-10-13 18:40:48 +00:00
Craig Topper 189e5b4ab6 [LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy
Summary:
getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant.

This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use.

Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets.

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53232

llvm-svn: 344460
2018-10-13 17:47:20 +00:00
Simon Pilgrim c5d7c6e5f6 [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead.
There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM.

I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen.

llvm-svn: 344457
2018-10-13 16:11:15 +00:00
Simon Pilgrim 1c2051ead7 [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead.
Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. 

llvm-svn: 344453
2018-10-13 15:16:55 +00:00
Thomas Lively 16c349d892 [Intrinsic] Add llvm.minimum and llvm.maximum instrinsic functions
Summary:
These new intrinsics have the semantics of the `minimum` and `maximum`
operations specified by the latest draft of IEEE 754-2018. Unlike
llvm.minnum and llvm.maxnum, these new intrinsics propagate NaNs and
always treat -0.0 as less than 0.0. `minimum` and `maximum` lower
directly to the existing `fminnan` and `fmaxnan` ISel DAG nodes. It is
safe to reuse these DAG nodes because before this patch were only
emitted in situations where there were known to be no NaN arguments or
where NaN propagation was correct and there were known to be no zero
arguments. I know of only four backends that lower fminnan and
fmaxnan: WebAssembly, ARM, AArch64, and SystemZ, and each of these
lowers fminnan and fmaxnan to instructions that are compatible with
the IEEE 754-2018 semantics.

Reviewers: aheejin, dschuff, sunfish, javed.absar

Subscribers: kristof.beyls, dexonsmith, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D52764

llvm-svn: 344437
2018-10-13 07:21:44 +00:00
Craig Topper a796580903 [LegalizeVectorTypes] Use TLI.getVectorIdxTy instead of DAG.getIntPtrConstant.
There's no guarantee that vector indices should use pointer types. So use the correct query method.

llvm-svn: 344428
2018-10-12 22:55:17 +00:00
Craig Topper 435e38a5df [LegalizeVectorTypes] When widening the result of a bitcast from a scalar type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs.
This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup.

The MIPS test change just looks to be an instruction ordering change.

llvm-svn: 344422
2018-10-12 21:59:55 +00:00
Craig Topper 1bb0c6041a [LegalizeVectorTypes] When widening the operands to a concat_vectors, see if we can use the widened operand 0 if the width matches and the other operands are undef.
This saves a conversion to extracts and build_vector. We already do this when both the result and the input need to be widened to the same type.

This changed the sse-intrinsics-fast-isel test because we don't lower (insert_vector_elt (scalar_to_vector X), Y, 1) well. We turn it into (vector_shuffle (scalar_to_vector X), (scalar_to_vector Y), <0, 4, 2, 3>) losing track of the fact that the upper elts could be undef.

We should probably find a way to prevent the scalarization of the <2 x f32> load on these tests.

llvm-svn: 344404
2018-10-12 19:37:49 +00:00
Craig Topper 05f014a684 [LegalizeVectorTypes] When unrolling in WidenVecRes_Convert, make sure we use the original vector element count. Not min of the widened result type and the possibly widened input type.
If the input type is widened as well, but we still were forced to unroll, we shouldn't be considering the widened input element count. We should only create as many scalar operations as the original type called for.

This will be important for an upcoming patch.

llvm-svn: 344403
2018-10-12 19:37:47 +00:00
Simon Pilgrim c046b6856e Pull out repeated value types. NFCI.
llvm-svn: 344355
2018-10-12 15:49:19 +00:00
Simon Pilgrim b926fd7b36 Pull out repeated value types. NFCI.
llvm-svn: 344354
2018-10-12 15:48:47 +00:00
Simon Pilgrim b8339c0167 [SelectionDAG] Move VectorLegalizer::ExpandCTLZ codegen into SelectionDAGLegalize
Generalize SelectionDAGLegalize's CTLZ expansion to handle vectors - lets VectorLegalizer::ExpandCTLZ to just pass the expansion on instead of repeating the same codegen.

llvm-svn: 344349
2018-10-12 14:45:57 +00:00
Sanjay Patel 56b6660d2e [DAGCombiner] rearrange extract_element+bitcast fold; NFC
I want to add another pattern here that includes scalar_to_vector,
so this makes that patch smaller. I was hoping to remove the
hasOneUse() check because it shouldn't be necessary for common
codegen, but an AMDGPU test has a comment suggesting that the
extra check makes things better on one of those targets.

llvm-svn: 344320
2018-10-11 23:56:56 +00:00
Nirav Dave f1f2a2a31a [DAG] Fix Big Endian in Load-Store forwarding
Summary:
Correct offset calculation in load-store forwarding for big-endian
targets.

Reviewers: rnk, RKSimon, waltl

Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53147

llvm-svn: 344272
2018-10-11 18:28:59 +00:00
Sanjay Patel 4875662e57 [DAGCombiner] move comment closer to the corresponding code; NFC
llvm-svn: 344255
2018-10-11 16:07:25 +00:00
Nirav Dave 07acc992dc [DAGCombine] Improve Load-Store Forwarding
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.

Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.

Reviewers: RKSimon, rnk, kparzysz, javed.absar

Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D49200

llvm-svn: 344142
2018-10-10 14:15:52 +00:00
Simon Pilgrim bc7b6251b6 [TargetLowering] SimplifyDemandedBits - rename demanded mask args. NFCI.
Help stop bugs like rL343935 by making the 'original' DemandedBits arg more obviously not the mask that is actually used.

llvm-svn: 344138
2018-10-10 13:00:49 +00:00
Simon Pilgrim 53a503f6ac [TargetLowering] SimplifyDemandedBits - pull out repeated getOperands. NFCI.
Part of a minor cleanup to make all the switch statements more consistent prior to improving vector support.

llvm-svn: 344136
2018-10-10 12:32:13 +00:00
Simon Pilgrim 5cb3a82892 [TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts
Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful.

Differential Revision: https://reviews.llvm.org/D53026

llvm-svn: 344132
2018-10-10 10:44:15 +00:00
Nemanja Ivanovic 72d4866e57 [DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X

When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)

As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.

Differential revision: https://reviews.llvm.org/D44548

llvm-svn: 344093
2018-10-09 23:20:11 +00:00
Simon Pilgrim 23f880317a [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG and CONCAT_VECTORS support to SimplifyDemandedBits
Fix for AVX1 masked load/store regression on D52964

llvm-svn: 344043
2018-10-09 13:13:35 +00:00
Sanjay Patel b64c0d7b53 [DAGCombiner] simplify code for fmul with constant fold; NFCI
llvm-svn: 343997
2018-10-08 21:17:20 +00:00
Alex Bradbury f27c67af12 [SelectionDAGBuilder][NFC] Pass LHSTy to getShiftAmountTy rather than RHSTy
r126518 introduced a a type parameter to the getShiftAmountTy target hook. It 
produces the type of the shift (RHSTy), parameterised by the type of the value 
being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather 
than LHSTy and this patch corrects this. The change is a no-op because in LLVM 
IR the LHS and RHS types for a shift must be equal anyway.

llvm-svn: 343955
2018-10-08 06:24:59 +00:00
Craig Topper 98dd9d6896 Revert r343948 "[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC"
The assert is failing some asan tests on the bots.

llvm-svn: 343950
2018-10-08 03:12:12 +00:00
Craig Topper c058a68784 [LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC
llvm-svn: 343948
2018-10-08 02:02:08 +00:00
Craig Topper cd38de8b15 [LegalizeDAG] Move legalization of scatter and masked store from LegalizeVectorOps to LegalizeDAG.
This is where we legalize gather and masked load so this is consistent.

Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG.

llvm-svn: 343947
2018-10-08 00:04:55 +00:00
Sanjay Patel ecc8af61e7 [DAGCombiner] allow undef elts in vector fadd matching
llvm-svn: 343945
2018-10-07 16:30:42 +00:00
Sanjay Patel ef76e27985 [DAGCombiner] allow undefs when matching vector splats for fmul folds
llvm-svn: 343942
2018-10-07 16:05:37 +00:00
Sanjay Patel 0b74c840dd [DAGCombiner] allow undef elts in vector fabs/fneg matching
This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().

llvm-svn: 343940
2018-10-07 15:32:06 +00:00
Sanjay Patel 46a9dc2e3e [DAGCombiner] shorten code for bitcast+fabs fold; NFC
llvm-svn: 343939
2018-10-07 15:18:30 +00:00
Simon Pilgrim 3b04a4e322 [SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).

Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.

Thanks to Jan Vesely (@jvesely) for bisecting the bug.

llvm-svn: 343935
2018-10-07 11:45:46 +00:00
Craig Topper e4d199e360 [LegalizeVectorOps] Make ExpandStrictFPOp return the result corresponding to the result number of the SDValue passed in.
It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed.

llvm-svn: 343933
2018-10-07 07:16:44 +00:00
Simon Pilgrim 9c9c97bcf4 [SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.

There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.

As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).

Fixes PR39178

Differential Revision: https://reviews.llvm.org/D52935

llvm-svn: 343913
2018-10-06 10:20:04 +00:00
Sanjay Patel f6a160a102 [SelectionDAG] allow undefs when matching splat constants
And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that 
patch simpler.

llvm-svn: 343865
2018-10-05 17:42:19 +00:00
Craig Topper 7d2155e3f9 [X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result.
Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled.

This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine.

llvm-svn: 343817
2018-10-04 21:24:24 +00:00
Craig Topper 08ae6774eb [LegalizeIntegerTypes] Fix typo in comment. NFC
llvm-svn: 343750
2018-10-04 02:40:35 +00:00
Matthias Braun 004fe6bf83 DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types
This fixes a case of bad index calculation when merging mismatching
vector types. This changes the existing code to just use the existing
extract_{subvector|element} and a bitcast (instead of bitcast first and
then newly created extract_xxx) so we don't need to adjust any indices
in the first place.

rdar://44584718

Differential Revision: https://reviews.llvm.org/D52681

llvm-svn: 343493
2018-10-01 16:25:50 +00:00
Simon Pilgrim 818cfc40ff [DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalization
The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true

No test case to hand, noticed when investigating PR38226 + PR38970

llvm-svn: 343405
2018-09-30 12:46:42 +00:00
David Bolvansky 8e90bad63d [DAGCombiner] [NFC] Improve X div/rem 1 fold
Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52661

llvm-svn: 343349
2018-09-28 18:40:30 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Simon Pilgrim b0189289bf [DAG] SelectionDAGLegalize::ExpandLegalINT_TO_FP - use getFPExtendOrRound helper. NFCI.
Handles SrcVT == DstVT as well.

llvm-svn: 343121
2018-09-26 16:24:07 +00:00
Simon Pilgrim e2437689a8 [DAG] ExpandLegalINT_TO_FP - pull out repeated getValueType() call. NFCI.
llvm-svn: 343101
2018-09-26 12:42:19 +00:00
David Green 353cb3d4e5 [CodeGen] Enable tail calls for functions with NonNull attributes.
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.

Differential Revision: https://reviews.llvm.org/D52238

llvm-svn: 343091
2018-09-26 10:46:18 +00:00
Mikael Nilsson 9c8e35174e Run VerifyDAGDiverence in debug only
VerifyDAGDiverence costs compilation time, avoid running it in non-debug
builds.

Differential Revision: https://reviews.llvm.org/D52454

llvm-svn: 343086
2018-09-26 09:25:45 +00:00
Craig Topper b2a00acb24 [DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike returning a UDIVREM or SDIVREM node.
This shouldn't be possible and is a leftover from when we used to recursively call combine here.

llvm-svn: 343049
2018-09-25 23:52:07 +00:00
Heejin Ahn e41be38efd Unify landing pad information adding routines (NFC)
Summary:
We have `llvm::addLandingPadInfo` and `MachineFunction::addLandingPad`,
both of which add landing pad information to populate `LandingPadInfo`
but are called from different locations, which was confusing. This patch
unifies them with one `MachineFunction::addLandingPad` function, which
now has functionlities of both functions.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52428

llvm-svn: 343018
2018-09-25 19:56:44 +00:00
Sanjay Patel 10c11b867a [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)
This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749

We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. 
We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like 
extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches 
that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.

The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, 
we have this vector-type-legalized sequence:

        t29: v8i32 = concat_vectors t27, t28
      t30: v4i64 = bitcast t29
        t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ...
      t31: v4i64 = bitcast t18
    t32: v4i64 = xor t30, t31
      t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ...
    t34: v4i64 = bitcast t9
  t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
      t37: v4i32 = extract_subvector t36, Constant:i64<0>
      t38: v4i32 = extract_subvector t36, Constant:i64<4>

Differential Revision: https://reviews.llvm.org/D52318

llvm-svn: 343008
2018-09-25 19:09:34 +00:00
Nirav Dave a2f514d672 [LegalizeDAG] Prune Predecessor check in ExpandExtractFromVectorThroughStack. NFCI.
llvm-svn: 342985
2018-09-25 15:29:57 +00:00
Nirav Dave f445a67be4 [DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.
Reuse search space bookkeeping across multiple predecessor checks
qdone to avoid redundancy. This should cut search cost by ~4x.

llvm-svn: 342984
2018-09-25 15:29:30 +00:00
Nirav Dave 7373d5e646 [DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. NFCI.
llvm-svn: 342983
2018-09-25 15:29:04 +00:00
Nirav Dave 46ab89a0d0 [DAGCombine] Don't fold dependent loads across SELECT_CC.
DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC
after the select, resulting in a select of an address and a single
load after.

If either of the loads depend on the other, this is not legal as it
could introduce cycles. However, it only checked this if the opcode
was a SELECT, and not for a SELECT_CC.

Unfortunately, the only reproducer I have for this is for our
downstream target. I've tried getting it to trigger on an upstream one
but haven't been successful.

Patch thanks to Bevin Hansson.

llvm-svn: 342980
2018-09-25 14:43:05 +00:00
Sanjay Patel 2c901742ca [DAGCombiner] use UADDO to optimize saturated unsigned add
This is a preliminary step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

If we have an 'add' instruction that sets flags, we can use that to eliminate an
explicit compare instruction or some other instruction (cmn) that sets flags for 
use in the later select.

As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively 
reversing an IR icmp canonicalization that replaces a variable operand with a
constant:
https://rise4fun.com/Alive/V1Q

But we're not using 'uaddo' in those cases via DAG transforms. This happens in 
CGP after D8889 without checking target lowering to see if the op is supported. 
So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with 
"using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned 
saturated add and converts to uaddo without checking target capabilities.

This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only 
see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title 
(unlike x86 which sees improvements for all sizes because all sizes are 'custom'). 
But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. 
So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, 
so this patch will fire on those tests.

Another possibility given the existing behavior: we could remove the legal-or-custom 
check altogether because we're assuming that a UADDO sequence is canonical/optimal 
before we ever reach here. But that seems like a bug to me. If the target doesn't 
have an add-with-flags op, then it's not likely that we'll get optimal DAG combining 
using a UADDO node. This is similar justification for why we don't canonicalize IR to 
the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first 
place.

Differential Revision: https://reviews.llvm.org/D51929

llvm-svn: 342886
2018-09-24 14:47:15 +00:00
Hans Wennborg 83d15dfe2d Remove debug printf leftover from r342397
llvm-svn: 342863
2018-09-24 08:18:47 +00:00
Craig Topper 5bef27e808 [DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTOR
This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015.

llvm-svn: 342856
2018-09-24 02:03:11 +00:00
Craig Topper b3b94a8e8b [DAGCombiner] Clarify a comment. NFC
This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed.

llvm-svn: 342851
2018-09-23 21:17:56 +00:00
Craig Topper bec5967176 [LegalizeTypes] Fix bad indentation. NFC
llvm-svn: 342850
2018-09-23 21:17:55 +00:00
Sanjay Patel 0027946915 [DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose 
multiplies generically without a target hook to tell us when it's profitable.

ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.

This extends D52195 and may resolve PR34474: 
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)

llvm-svn: 342844
2018-09-23 18:41:38 +00:00
Craig Topper 81f67f7afb [DAGCombiner] Simplify some code in visitBITCAST. NFCI
llvm-svn: 342826
2018-09-22 23:12:34 +00:00
Craig Topper e79a588cac [DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCI
llvm-svn: 342809
2018-09-22 18:03:14 +00:00
Sanjay Patel 8a1227ccc8 [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. 
No functional change intended.

I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be 
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.

Differential Revision: https://reviews.llvm.org/D52285

llvm-svn: 342669
2018-09-20 17:34:08 +00:00
Sanjay Patel fdc0de19cb [SelectionDAG] allow vector types with isBitwiseNot()
The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits,
and the test diff in add.ll is from a DAGCombiner transform.

llvm-svn: 342594
2018-09-19 21:48:30 +00:00
Michael Berg 894c39f770 Copy utilities updated and added for MI flags
Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path.  Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw.

Reviewers: spatel, wristow, arsenm

Reviewed By: arsenm

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D52006

llvm-svn: 342576
2018-09-19 18:52:08 +00:00
Sanjay Patel 4fd2e2a498 [DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add
This is an alternative to D37896. I don't see a way to decompose multiplies 
generically without a target hook to tell us when it's profitable. 

ARM and AArch64 may be able to remove some duplicate code that overlaps with 
this transform.

As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474

As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.

Differential Revision: https://reviews.llvm.org/D52195

llvm-svn: 342554
2018-09-19 15:57:40 +00:00
Matthias Braun 726e12cf0c ScheduleDAG: Cleanup dumping code; NFC
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
  `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
  places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
  functions. They were only ever used for debug dumping and putting the
  function into ScheduleDAG is consistent with the `dumpNode()` change.

llvm-svn: 342520
2018-09-19 00:23:35 +00:00
Amara Emerson 91c2913522 Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index.""
Fixed the assertion failure.

llvm-svn: 342397
2018-09-17 14:40:13 +00:00
Sanjay Patel 3eaf500a6d [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.

Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. 
Otherwise, no regressions on x86-64 or A64."

I'm proposing to add only the minimum support for a DAG node here. Since we don't have an 
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I 
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We 
should be able to expand as needed when adding more functionality/transforms. For reference, 
these are transform suggestions currently listed in SimplifyLibCalls.cpp:

//   * cbrt(expN(X))  -> expN(x/3)
//   * cbrt(sqrt(x))  -> pow(x,1/6)
//   * cbrt(cbrt(x))  -> pow(x,1/9)

Also, given that we bail out on long double for now, there should not be any logical 
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).

Differential Revision: https://reviews.llvm.org/D51753

llvm-svn: 342348
2018-09-16 16:50:26 +00:00
Reid Kleckner 4d1b75c6b7 Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."
Causes 'isVector() && "Invalid vector type!"' assertion when building
Skia in Chrome.

llvm-svn: 342265
2018-09-14 19:39:40 +00:00
Adrian Prantl 16f58d1850 Fix debug info for SelectionDAG legalization of DAG nodes with two results.
This patch fixes the debug info handling for SelectionDAG legalization
of DAG nodes with two results. When an replaced SDNode has more than
one result, transferDbgValues was always copying the SDDbgValue from
the first result and attaching them to all members. In reality
SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes
(though the type signature doesn't make this obvious (cf. the call
site code in ReplaceNode()).

rdar://problem/44162227

Differential Revision: https://reviews.llvm.org/D52112

llvm-svn: 342264
2018-09-14 19:38:45 +00:00
Adrian Prantl 66945cf6e3 fix noasserts build
llvm-svn: 342247
2018-09-14 17:32:52 +00:00
Adrian Prantl 55b8756b8a SelectionDAG: Add compact SDDbgValue representation to -dag-dump-verbose output
llvm-svn: 342245
2018-09-14 17:08:02 +00:00
Adrian Prantl 86497ad2af fix typos
llvm-svn: 342241
2018-09-14 16:12:14 +00:00
Amara Emerson ef600cbd86 [DAGCombine] Fix crash when store merging created an extract_subvector with invalid index.
Differential Revision: https://reviews.llvm.org/D51831

llvm-svn: 342183
2018-09-13 21:28:58 +00:00
Matt Arsenault 842cda6312 DAG: Fix expansion of unaligned FP loads and stores
This was trying to scalarizing a scalar FP type,
resulting in an assert.

Fixes unaligned f64 stack stores for AMDGPU.

llvm-svn: 342132
2018-09-13 12:14:23 +00:00
Sanjay Patel 8a478b79dc [DAGCombiner] improve formatting for select+setcc code; NFC
llvm-svn: 342095
2018-09-12 23:03:50 +00:00
Craig Topper 26a3799858 [SelectionDAG] Remove some code from PromoteIntOp_MGATHER that handles UpdateNodeOperands returning an existing node instead of updating.
I suspect this became unecessary when the CSE of mgather was fixed in r338080. It may still be possible to hit this if we widen the element type of a gather outside of type legalization and the promote the mask of a separate gather node so they become the same. But that seems pretty unlikely.

llvm-svn: 342022
2018-09-12 05:25:41 +00:00
Matt Arsenault 57b5966dad DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.

Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.

This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.

llvm-svn: 341801
2018-09-10 11:49:23 +00:00
Sanjay Patel 6ebf218e4c [SelectionDAG] enhance vector demanded elements to look at a vector select condition operand
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.

The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit 
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed 
to be running there?

Differential Revision: https://reviews.llvm.org/D51696

llvm-svn: 341762
2018-09-09 14:13:22 +00:00
Simon Pilgrim 96d6b9c2e2 [DAGCombiner] foldBitcastedFPLogic - Add basic vector support
Add support for bitcasts from float type to an integer type of the same element bitwidth.

There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet.

llvm-svn: 341652
2018-09-07 12:13:45 +00:00
Sanjay Patel dbf52837fe [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. 
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. Some potential enhancements are in the TODO comments:

1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with 
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty 
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses 
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should 
also be covered by existing tests).

Differential Revision: https://reviews.llvm.org/D51630 

llvm-svn: 341481
2018-09-05 17:01:56 +00:00
Matt Arsenault a6f32f4015 DAG: Factor out helper function for odd vector sizes
llvm-svn: 341392
2018-09-04 18:47:43 +00:00
Scott Linder cab029f474 [CodeGen] Fix remaining zext() assertions in SelectionDAG
Fix remaining cases not committed in https://reviews.llvm.org/D49574

Differential Revision: https://reviews.llvm.org/D50659

llvm-svn: 341380
2018-09-04 16:33:34 +00:00
Matt Arsenault ca25b58957 DAG: Handle extract_vector_elt in isKnownNeverNaN
llvm-svn: 341317
2018-09-03 14:01:03 +00:00
Roman Lebedev d7a6244475 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern
Summary:
A follow-up for D49266 / rL337166 + D49497 / rL338044.

This is still the same pattern to check for the [lack of]
signed truncation, but in this case the constants and the predicate
are negated.

https://rise4fun.com/Alive/BDV
https://rise4fun.com/Alive/n7Z

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51532

llvm-svn: 341287
2018-09-02 13:56:22 +00:00
Craig Topper 6666861158 [DAGCombiner] Fix bad identation. NFC
llvm-svn: 341103
2018-08-30 19:35:40 +00:00
Nicolai Haehnle 35617ed4cb [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis
Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).

The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.

Patch by: Simon Moll

Reviewed By: nhaehnle

Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50434

Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071
2018-08-30 14:21:36 +00:00
Matt Arsenault 167601e629 DAG: Don't use ABI copies in some contexts
If an ABI-like value is used in a different block,
the type split used is not necessarily the same as
the call's ABI. The value is used through an intermediate
copy virtual registers from the other block. This
resulted in copies with inconsistent sizes later.

Fixes regressions since r338197 when AMDGPU started
splitting vector types for calls.

llvm-svn: 341018
2018-08-30 05:49:28 +00:00
Simon Pilgrim b49d5f3b53 [DAGCombiner] Add X / X -> 1 & X % X -> 0 folds
Adds more divrem folds to try and get in sync with InstructionSimplify

Differential Revision: https://reviews.llvm.org/D50636

llvm-svn: 340919
2018-08-29 11:30:16 +00:00
Craig Topper 9f42726cc7 [X86] Support v2i32 gather/scatter indices with -x86-experimental-vector-widening-legalization
Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp

Reviewers: RKSimon, spatel, efriedma

Reviewed By: efriedma

Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D51337

llvm-svn: 340891
2018-08-29 02:12:49 +00:00
Nirav Dave 11e39fb6fb [DAGCombine] Rework MERGE_VALUES to inline in single pass. NFCI.
Avoid hyperlinear cost of inlining MERGE_VALUE node by constructing
temporary vector and doing a single replacement.

llvm-svn: 340853
2018-08-28 18:13:26 +00:00
Nirav Dave 113f2b9058 [DAG] Avoid recomputing Divergence checks. NFCI.
When making multiple updates to the same SDNode, recompute node
divergence only once after all changes have been made.

llvm-svn: 340852
2018-08-28 18:13:00 +00:00
Nirav Dave 0b8cb46e0b [DAG] Fix updateDivergence calculation
Check correct SDNode when deciding if we should update the divergence
property.

llvm-svn: 340851
2018-08-28 18:12:35 +00:00
Craig Topper c7506b28c1 [DAGCombiner][AMDGPU][Mips] Fold bitcast with volatile loads if the resulting load is legal for the target.
Summary:
I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there...

I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))).

Reviewers: efriedma, atanasyan, arsenm

Reviewed By: efriedma

Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50491

llvm-svn: 340797
2018-08-28 03:47:20 +00:00
Matt Arsenault cea7c6969d DAG: Check transformed type for forming fminnum/fmaxnum from vselect
Follow up to r340655 to fix vector types which are split.

llvm-svn: 340766
2018-08-27 18:11:31 +00:00
Sanjay Patel f645927875 [SelectionDAG] add helper query for binops; NFC
We will also use this in a planned enhancement for vector insertelement.

llvm-svn: 340741
2018-08-27 14:20:15 +00:00
Sanjay Patel 113cac3b15 [SelectionDAG][x86] turn insertelement into undef with variable index into splat
I noticed this along with the patterns in D51125, but when the index is variable, 
we don't convert insertelement into a build_vector.

For x86, that means these get expanded at legalization time into the loading/spilling 
code that we see in the tests. I think it's always better to avoid going to memory on 
these, and we get the optimal 'broadcast' if it's available.

I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have 
regression tests that would be affected (although I did not check what would happen in 
those cases). In the most basic cases shown here, AArch64 would probably do much 
better with a splat.

Differential Revision: https://reviews.llvm.org/D51186

llvm-svn: 340705
2018-08-26 18:20:41 +00:00
Chandler Carruth 9ae926b973 [IR] Replace `isa<TerminatorInst>` with `isTerminator()`.
This is a bit awkward in a handful of places where we didn't even have
an instruction and now we have to see if we can build one. But on the
whole, this seems like a win and at worst a reasonable cost for removing
`TerminatorInst`.

All of this is part of the removal of `TerminatorInst` from the
`Instruction` type hierarchy.

llvm-svn: 340701
2018-08-26 09:51:22 +00:00
Craig Topper a11a3b3818 [SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first.
Summary:
Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions.

This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly.

X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree.

Reviewers: RKSimon, delena, hfinkel, eli.friedman

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50402

llvm-svn: 340689
2018-08-25 17:48:17 +00:00
Matt Arsenault 5b9ef39bdd DAG: Allow matching fminnum/fmaxnum from vselect
llvm-svn: 340655
2018-08-24 21:24:18 +00:00
Craig Topper d8e91c3e8d [DAGCombiner][Mips] Don't combine bitcast+store after LegalOperations when the store is volatile, if the resulting store isn't Legal
Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop.

The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them

This is related to D50491 where I was trying to allow some bitcasting of volatile loads

Differential Revision: https://reviews.llvm.org/D50578

llvm-svn: 340626
2018-08-24 17:48:25 +00:00
Justin Bogner fbbd4366a6 [SDAG] Add versions of computeKnownBits that return a value
Having the KnownBits as an output parameter is kind of awkward to use
and a holdover from when it was two separate APInts. Instead, just
return a KnownBits object.

I'm leaving the existing interface in place for now, since updating
the callers all at once would be thousands of lines of diff.

llvm-svn: 340594
2018-08-24 02:42:24 +00:00
Sanjay Patel ed1b9695ee [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527)
This solves the motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38527

If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, 
but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll 
the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge 
that some of the scalar elements are undef.

Differential Revision: https://reviews.llvm.org/D50791

llvm-svn: 340469
2018-08-22 22:52:05 +00:00
Eli Friedman 96e3cd85bd [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.

Differential Revision: https://reviews.llvm.org/D47917

llvm-svn: 340458
2018-08-22 21:47:14 +00:00
Heejin Ahn 9cd7f88a35 [WebAssembly] Don't make wasm cleanuppads into funclet entries
Summary:
Catchpads and cleanuppads are not funclet entries; they are only EH
scope entries. We already dont't set `isEHFuncletEntry` for catchpads.
This patch does the same thing for cleanuppads.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50654

llvm-svn: 340330
2018-08-21 20:04:42 +00:00
Sam Parker 597811e7a7 [DAGCombiner] Reduce load widths of shifted masks
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 340261
2018-08-21 10:26:59 +00:00
Simon Pilgrim 72b324de4d [TargetLowering] Add BuildSDiv support for division by one or negone.
This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1.

llvm-svn: 340260
2018-08-21 10:20:36 +00:00
Cameron McInally 94b9029be9 [FPEnv] Support constrained FREM intrinsic
Differential Revision: https://reviews.llvm.org/D50975

llvm-svn: 340201
2018-08-20 19:28:56 +00:00
Simon Pilgrim 6ac905926f [TargetLowering] Disable BuildSDiv division by one or negone.
Fuzz tests have detected an issue, currently working on a fix.

llvm-svn: 340195
2018-08-20 18:23:54 +00:00
Simon Pilgrim 1a00042270 [SelectionDAG] Reuse the Op's VT. NFCI.
llvm-svn: 340173
2018-08-20 13:44:03 +00:00
Simon Pilgrim 5b78c9d58d [SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for BITCAST nodes
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.

Handle the case where the sign bit extends to only part of the small elements.

llvm-svn: 340169
2018-08-20 13:05:48 +00:00
Simon Pilgrim 5b936ec89e [SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for BITCAST nodes
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.

The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.

llvm-svn: 340143
2018-08-19 17:47:50 +00:00
Hsiangkai Wang 68c706ceb7 [DebugInfo] In FastISel, convert llvm.dbg.label to DBG_LABEL MI.
Convert llvm.dbg.label(!label_metadata) to DBG_LABEL !label_metadata.

Differential Revision: https://reviews.llvm.org/D50622

llvm-svn: 340122
2018-08-18 14:55:34 +00:00
Craig Topper cc5dbbf759 [DAGCombiner] Allow divide by constant optimization on opaque constants.
Summary:
I believe this restores the behavior we had before r339147.

Fixes PR38622.

Reviewers: RKSimon, chandlerc, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50936

llvm-svn: 340120
2018-08-18 05:52:42 +00:00
Matt Arsenault 25e51540e1 DAG: Fix isKnownNeverNaN for basic non-sNaN cases
fadd/fsub/fmul need to worry about infinities as well
as fdiv.

llvm-svn: 340085
2018-08-17 21:19:22 +00:00
Simon Pilgrim 03e57521c0 [DAGCombiner] extractShiftForRotate - fix out of range shift issue
Don't just check for negative shift amounts.

Fixes OSS Fuzz #9935
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935

llvm-svn: 340015
2018-08-17 12:25:18 +00:00
Simon Pilgrim 5113b48798 [DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding
Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly.

Differential Revision: https://reviews.llvm.org/D35722

llvm-svn: 340010
2018-08-17 10:52:49 +00:00
Chen Zheng e2d47dd1bb [MISC]Fix wrong usage of std::equal()
Differential Revision: https://reviews.llvm.org/D49958

llvm-svn: 340000
2018-08-17 07:51:01 +00:00
Craig Topper 883ff69c93 [DAGCombiner] Don't reassociate operations that have the vector reduction flag set.
When nodes are reassociated the vector-reduction flag gets lost.

The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass.

I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set.

Differential Revision: https://reviews.llvm.org/D50827

llvm-svn: 339946
2018-08-16 21:54:05 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO.
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Patch by Simonas Kazlauskas!

Differential Revision: https://reviews.llvm.org/D50310

llvm-svn: 339922
2018-08-16 18:39:39 +00:00
Simon Pilgrim 87d0039a45 [TargetLowering] Add support for non-uniform vectors to BuildSDIV
This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators.

This is the last patch necessary to close PR36545

Differential Revision: https://reviews.llvm.org/D50765

llvm-svn: 339908
2018-08-16 17:44:33 +00:00
Simon Pilgrim ede4905375 [TargetLowering] Refactor BuildSDIV in preparation for D50765. NFCI.
Pull out magic factor calculators into a helper function, use 0/+1/-1 multiplication factor to (optionally) add/sub the numerator.

llvm-svn: 339898
2018-08-16 16:54:06 +00:00
Matt Arsenault 0f2c1cf429 DAG: Use getObjectOffset helper
llvm-svn: 339813
2018-08-15 21:03:44 +00:00
Matt Arsenault 22f01268fe DAG: Try to custom lower when promoting float operands
For some reason this wasn't done for floats like
integers.

llvm-svn: 339811
2018-08-15 20:34:54 +00:00
Simon Pilgrim 4b2317ebfb [TargetLowering] Minor cleanup of TargetLowering::BuildSDIV. NFCI.
Pull out some types to match layout in TargetLowering::BuildUDIV. Early step towards adding non-uniform vector support.

llvm-svn: 339763
2018-08-15 11:11:05 +00:00
Simon Pilgrim a4ba43d3d3 [TargetLowering] Minor refactor to TargetLowering::BuildUDIV to merge scalar/vector magic value collection. NFCI.
Use the same ISD::matchUnaryPredicate pattern that was used in D50392.

llvm-svn: 339758
2018-08-15 10:11:13 +00:00
Simon Pilgrim e8a906ba47 [DagCombiner] Don't bother adding to the work list if TLI.BuildSDIVPow2 failed. NFCI.
Matches the code in BuildSDIV/BuildUDIV

llvm-svn: 339757
2018-08-15 10:02:54 +00:00
Simon Pilgrim a272fa9b0c [TargetLowering] Add support for non-uniform vectors to BuildExactSDIV
This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators.

Differential Revision: https://reviews.llvm.org/D50392

llvm-svn: 339756
2018-08-15 09:35:12 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Cameron McInally 00b0658aae [FPEnv] Scalarize StrictFP vector operations
Add a helper function to scalarize constrained FP operations as needed.

Differential Revision: https://reviews.llvm.org/D50720

llvm-svn: 339735
2018-08-14 22:13:11 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Nirav Dave fbfe2ad9e0 [DAG] Avoid redundant chain transversal in store merge cycle check. NFCI.
Patch by Henric Karlsson.

llvm-svn: 339688
2018-08-14 16:20:43 +00:00
Scott Linder 35213793bc [CodeGen] Fix assert in SelectionDAG::computeKnownBits
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.

Differential Revision: https://reviews.llvm.org/D49574

llvm-svn: 339600
2018-08-13 18:44:21 +00:00
Simon Pilgrim 26e3d3f1c8 [DAGCombiner] simplifyDivRem - add comment describing divide by undef/zero combine. NFC.
llvm-svn: 339561
2018-08-13 13:12:25 +00:00
Craig Topper cacf12a149 [SelectionDAG] In PromoteFloatOp_BITCAST, insert a bitcast after the fp_to_fp16 in case the result type isn't a scalar integer.
This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary.

llvm-svn: 339536
2018-08-13 06:53:49 +00:00
Craig Topper e42a159537 [SelectionDAG] In PromoteIntRes_BITCAST, when the input is TypePromoteFloat, make sure the output type is scalar. For vectors, use a store and load of temporary.
Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid.

This is basically the opposite case of the root cause of PR38533.

llvm-svn: 339535
2018-08-13 06:53:47 +00:00
Craig Topper 42e32117bb [SelectionDAG] In PromoteFloatRes_BITCAST, insert a bitcast before the fp16_to_fp in case the input type isn't an i16.
The bitcast can be further legalized as needed.

Fixes PR38533.

llvm-svn: 339533
2018-08-13 05:26:49 +00:00
Matt Arsenault 1201301b94 DAG: Check no-signed-zeros instead of unsafe-fp-math
Addresses fixme, although this should still be checking individual
operand flags.

llvm-svn: 339525
2018-08-12 19:09:12 +00:00
Craig Topper 60177f1aee [TargetLowering] Simplify one of the special cases in SimplifyDemandedBits for XOR. NFCI
We were checking for all bits being Known by checking Known.Zero|Known.One, but if all the bits are known then the value should be a Constant and we can just check for that instead.

llvm-svn: 339509
2018-08-12 06:52:03 +00:00
Craig Topper d112206004 [TargetLowering] Use APInt::isSubsetOf to simplify some code. NFC
llvm-svn: 339508
2018-08-12 05:34:15 +00:00
Sanjay Patel 15d1501aae [SelectionDAG] try harder to convert funnel shift to rotate
Similar to rL337966 - if the DAGCombiner's rotate matching was 
working as expected, I don't think we'd see any test diffs here.

AArch only goes right, and PPC only goes left. 
x86 has both, so no diffs there.

Differential Revision: https://reviews.llvm.org/D50091

llvm-svn: 339359
2018-08-09 17:26:22 +00:00
Michael Berg ca38254601 extend folding fsub/fadd to fneg for FMF
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Reviewers: spatel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D50195

llvm-svn: 339357
2018-08-09 17:00:03 +00:00
Simon Pilgrim a9f95429d9 [TargetLowering] Add BuildSDIVPattern helper to BuildExactSDIV (NFCI).
As requested in D50392, pull the magic constant calculations out into a helper function.

llvm-svn: 339346
2018-08-09 13:56:04 +00:00
Sanjay Patel e47dc1a405 [DAGCombiner] loosen constraints for fsub+fadd fold
isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.

llvm-svn: 339299
2018-08-08 23:04:43 +00:00
Sanjay Patel e327266d45 [DAGCombiner] move fadd simplification ahead of other folds
I don't know if it's possible to expose this diff in a test,
but we should always try simplifications (no new nodes created)
before more complicated transforms for efficiency (similar to
what we do in IR).

llvm-svn: 339298
2018-08-08 22:46:30 +00:00
Simon Pilgrim 4d4220fa2a [DAG] DAGCombiner::visitSDIVLike - remove unnecessary isConstOrConstSplat call. NFCI.
The isConstOrConstSplat result is only used in a ISD::matchUnaryPredicate call which can perform the equivalent iteration just as quickly.

llvm-svn: 339262
2018-08-08 15:37:52 +00:00
Simon Pilgrim 164e8b0b5c [TargetLowering] BuildUDIV - Add support for divide by one (PR38477)
Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike.

I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x

llvm-svn: 339254
2018-08-08 14:51:19 +00:00
Simon Pilgrim e4a4cf5a8b [TargetLowering] Remove APInt divisor argument from BuildExactSDIV (NFCI).
As requested in D50392, this is a minor refactor to BuildExactSDIV to stop taking the uniform constant APInt divisor and instead extract it locally.

I also cleanup the operands and valuetypes to better match BuildUDiv (and BuildSDIV in the near future).

llvm-svn: 339246
2018-08-08 13:59:44 +00:00
Ties Stuij 81f1fbdf5a test commit access
Summary: changing a few typos

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50445

llvm-svn: 339245
2018-08-08 13:51:13 +00:00
Simon Pilgrim a10cfcc1db [TargetLowering] BuildUDIV - Early out for divide by one (PR38477)
We're not handling the UDIV by one special case properly - for now just early out.

llvm-svn: 339229
2018-08-08 10:00:54 +00:00
Thomas Preud'homme 4107b31df2 Support inline asm with multiple 64bit output in 32bit GPR
Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR).

Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45437

llvm-svn: 339225
2018-08-08 09:35:26 +00:00
Craig Topper 49ed49fcb1 [SelectionDAG] When splitting scatter nodes during DAGCombine, create a serial chain dependency.
Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code.

Differential Revision: https://reviews.llvm.org/D50374

llvm-svn: 339157
2018-08-07 17:35:02 +00:00
Simon Pilgrim 1bfadb0499 [DAG] Allow non-uniform constant vectors to call BuildSDIV
This was missed in D50185.

NFC until we add actual non-uniform support to BuildSDIV (similar BuildUDIV support in D49248) - for now it just early outs.

llvm-svn: 339147
2018-08-07 14:50:39 +00:00
Simon Pilgrim 6943e39353 [TargetLowering] Use pre-computed Shift value type in BuildUDIV (NFCI)
This was missed in D49248

llvm-svn: 339146
2018-08-07 14:40:21 +00:00
Simon Pilgrim 7e18938793 [TargetLowering] Add support for non-uniform vectors to BuildUDIV
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.

It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.

Differential Revision: https://reviews.llvm.org/D49248

llvm-svn: 339121
2018-08-07 09:51:34 +00:00
Craig Topper 9de1797c50 [SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru.
Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from.

llvm-svn: 339101
2018-08-07 06:52:49 +00:00
Craig Topper 17989208a9 [SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes.
getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather.

llvm-svn: 339096
2018-08-07 06:13:40 +00:00
Reid Kleckner 15e91c3235 [X86] Fix assertion in subreg extraction
This assert fires when attempting to extract a subregister from the
global PIC base register. This virtual register SD node is not in the
VRBaseMap, so we shouldn't call getVR to look it up there. If this is a
RegisterSDNode, we should be able to use the virtual register directly.

Fixes PR38385

llvm-svn: 339056
2018-08-06 21:16:16 +00:00
Hsiangkai Wang ef72e481ea [DebugInfo] Refactor DbgInfoIntrinsic class hierarchy.
In the past, DbgInfoIntrinsic has a strong assumption that these
intrinsics all have variables and expressions attached to them.
However, it is too strong to derive the class for other debug entities.
Now, it has problems for debug labels.

In order to make DbgInfoIntrinsic as a base class for 'debug info', I
create a class for 'variable debug info', DbgVariableIntrinsic.

DbgDeclareInst, DbgAddrIntrinsic, and DbgValueInst will be derived from it.

Differential Revision: https://reviews.llvm.org/D50220

llvm-svn: 338984
2018-08-06 03:59:47 +00:00
Craig Topper c4960582ec [SelectionDAG] Teach LegalizeVectorTypes to widen the mask input to a masked store.
The mask operand is visited before the data operand so we need to be able to widen it.

Fixes PR38436.

llvm-svn: 338915
2018-08-03 20:14:18 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Simon Pilgrim 94112ebc75 [TargetLowering] Generalise BuildSDIV function
First step towards a BuildSDIV equivalent to D49248 for non-uniform vector support - this just pushes the splat detection down into TargetLowering::BuildSDIV where its still used.

Differential Revision: https://reviews.llvm.org/D50185

llvm-svn: 338838
2018-08-03 10:00:54 +00:00
Matt Arsenault 1f3977a856 DAG: Fix vector widening fcanonicalize
llvm-svn: 338715
2018-08-02 13:43:53 +00:00
Lei Liu b9a7b7a84d Fix FCOPYSIGN expansion
In expansion of FCOPYSIGN, the shift node is missing when the two
operands of FCOPYSIGN are of the same size. We should always generate
shift node (if the required shift bit is not zero) to put the sign
bit into the right position, regardless of the size of underlying
types.

Differential Revision: https://reviews.llvm.org/D49973

llvm-svn: 338665
2018-08-02 01:54:12 +00:00
Michael Berg d3ce4c3d94 [NFC] small addendum to r334242, FMF propagation
llvm-svn: 338604
2018-08-01 18:06:49 +00:00
Sanjay Patel 8aac22e06a [SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r

...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.

Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).

llvm-svn: 338592
2018-08-01 17:17:08 +00:00
Simon Pilgrim a3548c960e [SelectionDAG] Make binop reduction matcher available to all targets
There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!).

Differential Revision: https://reviews.llvm.org/D50083

llvm-svn: 338586
2018-08-01 16:52:28 +00:00
Cameron McInally 04ae85859d [FPEnv] Widen illegal width StrictFP vector operations as needed
Differential Revision: https://reviews.llvm.org/D49806

llvm-svn: 338562
2018-08-01 14:17:19 +00:00
Matt Arsenault dcec0888e2 DAG: Correct pointer type used for stack slot
Correct the address space for the inserted argument
stack slot.

AMDGPU seems to not do anything with this information,
so I don't think this was breaking anything.

llvm-svn: 338428
2018-07-31 19:51:20 +00:00
Matt Arsenault a5ed032118 DAG: Fix PromoteFloatResult for fcanonicalize
llvm-svn: 338382
2018-07-31 14:15:22 +00:00
Hsiangkai Wang 615540d0f2 Test commit.
llvm-svn: 338352
2018-07-31 06:09:29 +00:00
Craig Topper 2f60ef2c78 [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.

llvm-svn: 338329
2018-07-30 23:22:00 +00:00
Sanjay Patel 9f807f44b1 [DAGCombiner] transform sub-of-shifted-signbit to add
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH

This is another step towards improving select-of-constants codegen (see D48970).

x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.

Differential Revision: https://reviews.llvm.org/D49924

llvm-svn: 338317
2018-07-30 22:21:37 +00:00
Craig Topper 42d312bb83 [TargetLowering] In BuildSDIV, add the MULHS/SMUL_LOHI to the Created vector.
BuildUDIV was already correct.

llvm-svn: 338304
2018-07-30 21:04:38 +00:00
Craig Topper a568a27dfa [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2.
llvm-svn: 338303
2018-07-30 21:04:34 +00:00
Craig Topper b94d5f853b Revert r338222 "[DAGCombiner] Remove unnecessary calls to AddToWorklist."
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.

So its probably safer to leave this alone.

llvm-svn: 338298
2018-07-30 20:27:10 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
David Bolvansky 2fa7fb14ea [DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed.  This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.

Patch by: sameconrad (Sam Conrad)

Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar

Reviewed By: lebedev.ri

Subscribers: efriedma, kparzysz, llvm-commits

Differential Revision: https://reviews.llvm.org/D47681

llvm-svn: 338270
2018-07-30 16:50:00 +00:00
Thomas Preud'homme 196149c943 Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR"
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.

Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.

llvm-svn: 338269
2018-07-30 16:48:39 +00:00
Craig Topper e978d2ee4a [DAGCombiner] Remove unnecessary calls to AddToWorklist.
The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes.

I've removed the most obvious cases here. There are probably more than can be removed.

llvm-svn: 338222
2018-07-29 18:39:26 +00:00
Sanjay Patel 7312206f2f revert r338206 because the test does not pass
Example of bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll

llvm-svn: 338214
2018-07-29 14:30:49 +00:00
Thomas Preud'homme 74ffd14e15 Fix crash on inline asm with 64bit matching input in 32bit GPR
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.

llvm-svn: 338206
2018-07-28 21:33:39 +00:00
Craig Topper 9db3573d3a [SelectionDAG] Pass std::vector by reference instead of by pointer to BuildSDIV/BuildUDIV.
This removes the need for an assert to ensure the pointer isn't null.

Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here.

llvm-svn: 338205
2018-07-28 19:44:20 +00:00
Matt Arsenault 81920b0a25 DAG: Add calling convention argument to calling convention funcs
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.

llvm-svn: 338194
2018-07-28 13:25:19 +00:00
Craig Topper 50b1d4303d [DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B)
This can be useful since addition is commutable, and subtraction is not.

This matches a transform that is also done by InstCombine.

llvm-svn: 338181
2018-07-28 00:27:25 +00:00
Sanjay Patel c7abb416dc [DAGCombiner] fold 'not' with signbit math
This is a follow-up suggested in D48970. 

Alive proofs:
https://rise4fun.com/Alive/sII

We can eliminate an instruction in the usual select-of-constants 
to bit hack transform by adjusting the add/sub with constant.
This is always a win. 

There are more transforms that are likely wins, but they may need 
target hooks in case some targets do not benefit. 

This is another step towards making up for canonicalizing to 
select-of-constants in rL331486.

llvm-svn: 338132
2018-07-27 16:42:55 +00:00
Matt Arsenault 611dff423c DAG: Remove unnecessary .str()
llvm-svn: 338112
2018-07-27 09:04:41 +00:00
Craig Topper 1a40a06549 [SelectionDAGBuilder] Add masked loads to PendingLoads rather than calling DAG.setRoot.
Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load.

This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization.

llvm-svn: 338085
2018-07-26 23:22:11 +00:00
Craig Topper 8da280f50b [SelectionDAG] Add MLOAD/MSTORE/MGATHER/MSCATTER to AddNodeIDCustom to properly calculate their folding set ID to allow them to be CSEd.
llvm-svn: 338080
2018-07-26 22:40:24 +00:00
Craig Topper 8b5a2f7aac [DAGCombiner] Remove some calls to AddToWorklist that should be unnecessary.
The DAGCombiner has a system for ensuring all nodes are visited. It doesn't require an AddToWorkList for every node that is created by a combine.

llvm-svn: 338079
2018-07-26 22:40:22 +00:00
Vedant Kumar b572f64212 [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.

The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).

This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.

One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.

Testing:

- check-llvm, and an end-to-end test using lldb to debug an optimized
  program.
- Existing unit tests for DIExpression::appendToStack fully cover the
  new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)

Differential Revision: https://reviews.llvm.org/D49454

llvm-svn: 338069
2018-07-26 20:56:53 +00:00
Roman Lebedev 41ba5c1455 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes.
Summary:
A follow-up for D49266 / rL337166.

At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyhZZ

We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.

I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49497

llvm-svn: 338044
2018-07-26 17:34:28 +00:00
Sanjay Patel 215dcbf4db [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.

llvm-svn: 337966
2018-07-25 21:38:30 +00:00
Ulrich Weigand 5f75371c5d Fix corruption of result number in LegalizeVectorOps.cpp
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.

Reviewed by: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D49805

llvm-svn: 337939
2018-07-25 17:08:13 +00:00
Thomas Preud'homme 768d6ce4a3 Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).

llvm-svn: 337903
2018-07-25 11:11:12 +00:00
Vedant Kumar 22bd6f99fa [SelectionDAG] Reduce DanglingDebugInfo memory traffic, NFC
This avoids approx. 2 x 10^5 DenseMap insertions in both non-debug and
debug -O2 builds of the sqlite3 amalgamation.

llvm-svn: 337751
2018-07-23 21:59:04 +00:00
Nirav Dave eac2ca4e28 [Legalize] Elide MERGE_VALUES created by scalarizeVectorLoad.
scalarizeVectorLoad creates MERGE_VALUES nodes which are immediately
decomposed in expandLoad. Elide the node in these cases.

llvm-svn: 337708
2018-07-23 16:43:42 +00:00
Cameron McInally 2c9bcffc92 [FPEnv] Legalize double width StrictFP vector operations
Differential Revision: https://reviews.llvm.org/D48809

llvm-svn: 337698
2018-07-23 14:40:17 +00:00
Craig Topper b8b21d2fca [SelectionDAGBuilder] Use APInt::isZero instead of comparing APInt::getZExtValue to 0 in a place where we can't be sure contents of the APInt fit in a uint64_t.
This is used on an extract vector element index which is most cases is going to be an i32 or i64 and the element will be a valid element number. But it is possible to construct IR with a larger type and large out of range value.

llvm-svn: 337652
2018-07-22 05:16:50 +00:00
Craig Topper bf61113e4f [SelectionDAGBuilder] Restrict vector reduction check to types with a power of 2 number of elements.
The check for the shuffles usages probably isn't correct for non power of 2 vectors.

llvm-svn: 337651
2018-07-22 05:16:49 +00:00
Nirav Dave 25802ac9fd [DAG] Avoid Node Update assertion due to AND simplification
Check for construction-time folding for incomplete AND nodes in
BackwardsPropagateMask.

Fixes PR38185.

Reviewers: RKSimon, samparker

Reviewed By: samparker

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D49444

llvm-svn: 337563
2018-07-20 15:27:24 +00:00
Nirav Dave 5a4e11ad9c [DAG] Fix Memory ordering check in ReduceLoadOpStore.
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.

This fixes PR37826.

Reviewers: spatel, davide, efriedma, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49388

llvm-svn: 337560
2018-07-20 15:20:50 +00:00
Craig Topper d8734450a2 [DAGCombiner] Fold X - (-Y *Z) -> X + (Y * Z)
llvm-svn: 337518
2018-07-20 01:40:03 +00:00
Stephen Canon 8995c5f0f6 Skip out of SimplifyDemandedBits for BITCAST of f16 to i16
Mirrors the existing exit path for f128, avoiding a crash later on.

Differential Revision: https://reviews.llvm.org/D49524

llvm-svn: 337506
2018-07-19 22:46:42 +00:00
Craig Topper c12c5d421f [DAGCombiner] Teach DAGCombiner that A-(-B) is A+B.
We already knew A+(-B) is A-B in visitAdd. This does the opposite for visitSub.

llvm-svn: 337502
2018-07-19 22:24:43 +00:00
Nirav Dave a747d3ca60 [ScheduleDAG] Fix unfolding of SUnits to already existent nodes.
Summary:
If unfolding an SUnit results in both load or the operation using it which
already exist in the DAG, abort the unfold if they are already scheduled.
If not, make sure we don't add duplicate dependencies.

This fixes PR37916.

Reviewers: davide, eli.friedman, fhahn, bogner

Subscribers: MatzeB, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48666

llvm-svn: 337409
2018-07-18 18:01:03 +00:00
Simon Pilgrim e4d12bb2d6 [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Differential Revision: https://reviews.llvm.org/D49262

llvm-svn: 337258
2018-07-17 09:45:35 +00:00
Sanjay Patel c71adc8040 [Intrinsics] define funnel shift IR intrinsics + DAG builder support
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html

We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, 
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation 
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" 
operation which may also be a single hardware instruction.

Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working 
on that (much larger patch), but then I concluded that we don't need it. At least as a first 
step, we have all of the backend support necessary to match these ops...because it was required. 
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are 
likely all that we'll ever need.

There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly 
necessary (as the test results here prove). That can be an efficiency improvement as a small 
follow-up patch.

So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. 

Differential Revision: https://reviews.llvm.org/D49242

llvm-svn: 337221
2018-07-16 22:59:31 +00:00
Fangrui Song cb0bab86b3 [CodeGen] Fix inconsistent declaration parameter name
llvm-svn: 337200
2018-07-16 18:51:40 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Daniel Cederman c3d8002c2e Avoid losing Hi part when expanding VAARG nodes on big endian machines
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.

For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.

```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```

Reviewers: jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48595

llvm-svn: 337161
2018-07-16 12:14:17 +00:00
Sanjay Patel 79a423cfa2 [DAGCombiner] fix typo in comment; NFC
llvm-svn: 337132
2018-07-15 17:09:35 +00:00
Sanjay Patel a41c886c55 [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970

llvm-svn: 337130
2018-07-15 16:27:07 +00:00
Matthias Braun 90ad6835dd CodeGen: Remove pipeline dependencies on StackProtector; NFC
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.

PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.

PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Patch by Stephen Crane <sjc@immunant.com>

Differential Revision: https://reviews.llvm.org/D49256

llvm-svn: 336964
2018-07-13 00:08:38 +00:00
Matthias Braun f03f32d478 Revert "(HEAD -> master, origin/master, arcpatch-D37582) CodeGen: Remove pipeline dependencies on StackProtector; NFC"
This was triggering pass scheduling failures.

This reverts commit r336929.

llvm-svn: 336934
2018-07-12 19:27:01 +00:00
Matthias Braun 9436570cbd CodeGen: Remove pipeline dependencies on StackProtector; NFC
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.

PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Patch by Stephen Crane <sjc@immunant.com>

Differential Revision: https://reviews.llvm.org/D49256

llvm-svn: 336929
2018-07-12 18:33:32 +00:00
Eli Friedman 0319c28459 [CodeGen] Emit more precise AssertZext/AssertSext nodes.
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.

Differential Revision: https://reviews.llvm.org/D49004

llvm-svn: 336868
2018-07-11 23:26:35 +00:00
Diogo N. Sampaio b0d85ef975 [NFC][InstCombine] Converts isLegalNarrowLoad into isLegalNarrowLdSt
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.

Reviewsers: samparker

Differential Revision: https://reviews.llvm.org/D48624

llvm-svn: 336800
2018-07-11 12:59:42 +00:00
Simon Pilgrim 075b04a55f [SelectionDAG] Add constant buildvector support to isKnownNeverZero
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).

llvm-svn: 336779
2018-07-11 09:56:41 +00:00
Simon Pilgrim df9d59771b [DAGCombiner] Support non-uniform X%C -> X-(X/C)*C folds
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.

We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.

Differential Revision: https://reviews.llvm.org/D48975

llvm-svn: 336774
2018-07-11 09:22:42 +00:00
Simon Pilgrim 97cf111689 [DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) fold
llvm-svn: 336773
2018-07-11 09:14:37 +00:00
Simon Pilgrim 4cb4609392 [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.

llvm-svn: 336701
2018-07-10 16:33:07 +00:00
Simon Pilgrim 641097d561 [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid recursive combining.
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.

llvm-svn: 336664
2018-07-10 13:18:16 +00:00
Simon Pilgrim ce5c19b623 [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI.
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.

llvm-svn: 336656
2018-07-10 11:38:00 +00:00
Roman Lebedev 5ccae1750b [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585
2018-07-09 19:06:42 +00:00
Craig Topper e3b0c7e5bd [SelectionDAG] Add VT consistency checks to the creation of ISD::FMA.
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.

llvm-svn: 336576
2018-07-09 18:23:55 +00:00
Simon Pilgrim 23f9eddabe [SelectionDAG] Split float and integer isKnownNeverZero tests
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.

This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.

Differential Revision: https://reviews.llvm.org/D48969

llvm-svn: 336492
2018-07-07 18:17:14 +00:00
Simon Pilgrim 0a36daa5a2 Use const APInt& to avoid extra copy. NFCI.
As discussed on D48825.

llvm-svn: 336491
2018-07-07 17:33:48 +00:00
Simon Pilgrim c1d1944053 [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D48825

llvm-svn: 336490
2018-07-07 17:30:06 +00:00
Nico Weber 038dbf3c24 Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.
llvm-svn: 336453
2018-07-06 17:37:24 +00:00
Diogo N. Sampaio 17be994942 Added missing semicolon
llvm-svn: 336428
2018-07-06 10:09:04 +00:00
Diogo N. Sampaio 742bf1a255 [SelectionDAG] https://reviews.llvm.org/D48278
D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

llvm-svn: 336426
2018-07-06 09:42:25 +00:00
Diogo N. Sampaio 734cfd11fe Testing commit permision
llvm-svn: 336384
2018-07-05 18:49:32 +00:00
Simon Pilgrim 74cc4cfa94 [DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code

llvm-svn: 336198
2018-07-03 14:11:32 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Simon Pilgrim fae337704e [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.

Thanks to @zvi for the original patch.

Differential Revision: https://reviews.llvm.org/D45806

llvm-svn: 336048
2018-06-30 12:22:55 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Simon Pilgrim 49cb65bb7b [DAGCombiner] Remove unused variable. NFCI.
Noticed in D45806 review.

llvm-svn: 335817
2018-06-28 09:29:08 +00:00
Nirav Dave 7c57ae57a8 [DAGCombine] Disable TokenFactor simplifications when optnone.
llvm-svn: 335773
2018-06-27 19:41:25 +00:00