Commit Graph

3324 Commits

Author SHA1 Message Date
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Carl Ritson 2791667d2e [DAGCombiner] Check term use before applying aggressive FSUB optimisations
Summary:
Without this check unnecessary FMA instructions are generated when the FSUB terms are reused.
This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation.

Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel

Reviewed By: arsenm

Subscribers: jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71656
2019-12-23 09:37:58 +09:00
Amaury Séchet ff6567cc77 [DAGCombiner] Add node back in the worklist in topological order in CommitTargetLoweringOpt
Summary:
Right now, DAGCombiner process the nodes in an iplementation defined order. This tends to be fragile as optimisation may or may not kick in depending on the traversal order.

This is part of a larger effort to get the DAGCombiner to process its node in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70921
2019-12-17 18:26:16 +01:00
Alex Richardson 11448eeb72 [NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD)
Summary:
To find potential opportunities to use getMemBasePlusOffset() I looked at
all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr
in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert
the files in the individual backends too.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71207
2019-12-13 21:40:03 +00:00
Sanjay Patel 2f0c7fd2db [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try)
The initial attempt (rG89633320) botched the logic by reversing
the source/dest types. Added x86 tests for additional coverage.
The vector tests show a potential improvement (fold vector load
instead of broadcasting), but that's a known/existing problem.

This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-13 14:03:54 -05:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sanjay Patel 9432937190 Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc"
This reverts commit 8963332c33.
There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.
2019-12-12 16:24:40 -05:00
Sanjay Patel 8963332c33 [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-12 15:44:13 -05:00
Sanjay Patel b39009bf1d [DAGCombiner] improve readability
This is not quite NFC because I changed the SDLoc to use the more
standard 'N' (the starting node for the fold).

This transform is a special-case of a more general fold that we
do in IR, but it seems like the general fold is needed here too
to avoid a potential regression seen in D58017.

https://rise4fun.com/Alive/3jZm
2019-12-12 13:16:50 -05:00
Amaury Séchet c594d14d40 [DAGCombine] Factor oplist operations. NFC 2019-12-02 19:12:03 +01:00
Amaury Séchet d8d5106225 [SelectionDAG] Reduce assumptions made about levels. NFC 2019-12-02 17:43:13 +01:00
Amaury Séchet ca818f4550 [DAGCombiner] Peek through vector concats when trying to combine shuffles.
Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68195
2019-11-28 23:57:29 +01:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Sanjay Patel 214683f3b2 [DAGCombiner] avoid crash on out-of-bounds insert index (PR44139)
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
2019-11-25 16:24:06 -05:00
Clement Courbet cb15ba84fe Reland "[DAGCombiner] Allow zextended load combines."
Check that the generated type is simple.
2019-11-22 14:47:18 +01:00
Clement Courbet 88e205525c Revert "[DAGCombiner] Allow zextended load combines."
Breaks some bots.
2019-11-22 09:01:08 +01:00
Clement Courbet 036790f988 [DAGCombiner] Allow zextended load combines.
Summary: or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base)

Reviewers: apilipenko, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70487
2019-11-22 08:40:19 +01:00
Hiroshi Yamauchi 52e377497d [PGO][PGSO] DAG.shouldOptForSize part.
Summary:
(Split of off D67120)

SelectionDAG::shouldOptForSize changes for profile guided size optimization.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70095
2019-11-21 14:16:00 -08:00
Clement Courbet 252567377c [DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors.
In preparation for D70487.
2019-11-21 08:53:37 +01:00
David Zarzycki 257acbf6ae
[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.

Work remaining: match ADD, etc.

Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast

Reviewed By: lebedev.ri

Subscribers: chfast, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70079
2019-11-20 16:25:42 +02:00
Matt Arsenault 7fe9435dc8 Work on cleaning up denormal mode handling
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.

This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.

Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.

The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.

Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
2019-11-19 22:01:14 +05:30
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Paweł Bylica 1c247dd028
[DAGCombiner] Drop redundant DAG method param. NFC 2019-11-14 14:02:53 +01:00
Paweł Bylica 9b89bda517
[DAGCombiner] Use TLI field already available. NFC 2019-11-14 14:02:52 +01:00
joanlluch d384ad6b63 [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by

```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.

Updates the MSP430 target with a custom implementation.

This continues  D69116, D69120, D69326 and updates them, so all of them must be committed before this.

Existing tests apply, a few more have been added.

Reviewers: asl, spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70042
2019-11-13 09:23:08 +01:00
Philip Reames db036ee0a4 [X86/Atomics] Correct a few transforms for new atomic lowering
This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609).  Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue.

These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
2019-11-05 13:20:08 -08:00
Thomas Preud'homme 646896a442 Fix PR40644: miscompile indexed FP constant store
Summary:
Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both
replace store of float by a store of an integer unconditionally. However
this generates wrong code when the store that is replaced is an indexed
or truncating store. This commit solves this issue by adding an early
return in these functions when the store being considered is not a
normal store.

Bug was only observed on out of tree targets, hence the lack of testcase
in this commit.

Reviewers: efriedma

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68420
2019-11-05 11:07:52 +00:00
Sanjay Patel 113181e9bd [DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)
Continuation of:
D69116

Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

See also D69099 and D69116

Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69120
2019-11-04 13:41:41 -05:00
Matt Arsenault 6221767055 DAG: Add DAG argument to isFPExtFoldable
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
2019-10-31 22:32:45 -07:00
Matt Arsenault 1725f28841 DAG: Add new control for ISD::FMAD formation
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
2019-10-31 07:51:38 -07:00
Sanjay Patel 1ebd4a2e3a [DAGCombiner] widen any_ext of popcount based on target support
This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53)
to handle the looser "any_extend" cast in addition to zext.

This is a prerequisite step for canonicalizing in the other direction
(narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
2019-10-28 10:07:12 -04:00
Kerry McLaughlin da720a38b9 [AArch64][SVE] Implement masked load intrinsics
Summary:
Adds support for codegen of masked loads, with non-extending,
zero-extending and sign-extending variants.

Reviewers: huntergr, rovka, greened, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68877
2019-10-28 10:06:14 +00:00
Sanjay Patel 85a2146c15 [SDAG] fold insert_vector_elt with undef index
Similar to:
rG4c47617627fb

This makes the DAG behavior consistent with IR's insertelement.

https://bugs.llvm.org/show_bug.cgi?id=42689

I've tried to maintain test intent for AArch64 and WebAssembly
by replacing undef index operands with something else.
2019-10-27 15:28:43 -04:00
Sanjay Patel e6c145e054 [DAGCombiner] widen zext of popcount based on target support
zext (ctpop X) --> ctpop (zext X)

This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688

I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that.
The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this:

  t5: i8 = ctpop t4
  t6: i32 = zero_extend t5  <-- created based on IR, but unused node?
    t7: i64 = zero_extend t5

Differential Revision: https://reviews.llvm.org/D69127
2019-10-25 14:10:51 -04:00
Simon Pilgrim a18818207a Fix cppcheck shadow variable warning. NFCI. 2019-10-24 22:14:36 +01:00
Graham Hunter 84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
David Zarzycki 59390efef2 [X86] Make memcmp() use PTEST if possible and also enable AVX1
llvm-svn: 374922
2019-10-15 17:40:12 +00:00
Sanjay Patel d545c9056e [DAGCombiner] fold select-of-constants based on sign-bit test
Examples:
  i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
  i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1

This is a small generalization of a fold requested in PR43650:
https://bugs.llvm.org/show_bug.cgi?id=43650

The sign-bit of the condition operand can be used as a mask for the true operand:
https://rise4fun.com/Alive/paT

Note that we already handle some of the patterns (isNegative + scalar) because
there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd().
It doesn't use any TLI hooks, so I can't easily rip out that code even though we're
duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(),
so it should not cause problems for targets that prefer select over shift.

Also worth noting: I thought we could generalize this further to include the case where
the true operand of the select is not constant, but Alive says that may allow poison to
pass through where it does not in the original select form of the code.

Differential Revision: https://reviews.llvm.org/D68949

llvm-svn: 374902
2019-10-15 15:23:57 +00:00
Sanjay Patel 3b581ac80f [DAGCombiner] fold vselect-of-constants to shift
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.

llvm-svn: 374555
2019-10-11 14:17:56 +00:00
Sanjay Patel 7b904ce724 [DAGCombiner] fold select-of-constants to shift
This reverses the scalar canonicalization proposed in D63382.

Pre: isPowerOf2(C1)
%r = select i1 %cond, i32 C1, i32 0
=>
%z = zext i1 %cond to i32
%r = shl i32 %z, log2(C1)

https://rise4fun.com/Alive/Z50

x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.

llvm-svn: 374397
2019-10-10 17:52:02 +00:00
Sanjay Patel 7f0e7c0b1c [DAGCombiner] reduce code duplication; NFC
llvm-svn: 374370
2019-10-10 15:38:29 +00:00
Amaury Sechet aaf0507896 [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 374340
2019-10-10 13:20:10 +00:00
Philip Reames 931120846e Conservatively add volatility and atomic checks in a few places
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.

As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.

This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.

Differential Revision: https://reviews.llvm.org/D68419

llvm-svn: 374261
2019-10-09 23:43:33 +00:00
Simon Pilgrim b4ba3cbda0 [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.

Fixes PR43217

Differential Revision: https://reviews.llvm.org/D68544

llvm-svn: 373871
2019-10-06 21:11:45 +00:00
Sanjay Patel f643fabb52 Revert [DAGCombine] Match more patterns for half word bswap
This reverts r373850 (git commit 25ba49824d)

This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680

llvm-svn: 373853
2019-10-06 15:27:34 +00:00
Amaury Sechet 25ba49824d [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 373850
2019-10-06 14:14:55 +00:00
Sanjay Patel 288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Simon Pilgrim 3c912c4abe [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 373343
2019-10-01 15:32:04 +00:00
Amaury Sechet e6f98c0073 [DAGCombiner] Clang format MatchRotate. NFC
llvm-svn: 373269
2019-09-30 21:41:52 +00:00
Amaury Sechet 496c0564f1 [DAGCombiner] Update MatchRotate so that it returns an SDValue. NFC
llvm-svn: 373260
2019-09-30 20:47:23 +00:00
Thomas Raoux 3c8c667235 [TargetLowering] Make allowsMemoryAccess methode virtual.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.

Differential Revision: https://reviews.llvm.org/D67121

llvm-svn: 372935
2019-09-26 00:16:01 +00:00
Sanjay Patel 831a7e7068 [DAGCombiner] add one-use restriction to vector transform with cheap extract
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.

llvm-svn: 372886
2019-09-25 15:08:33 +00:00
Ilya Biryukov 60e5e0b667 Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Reason: this caused severe compile time regressions in JAX.
See email thread  of original revision on llvm-commits for details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html

llvm-svn: 372756
2019-09-24 13:48:02 +00:00
Simon Pilgrim af6043557d [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 372333
2019-09-19 15:02:47 +00:00
Amaury Sechet 9e94ef42ba [DAGCombiner] Add node to the worklist in topological order in scalarizeExtractedVectorLoad
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66661

llvm-svn: 372327
2019-09-19 14:22:11 +00:00
Simon Pilgrim c65dd89804 [DAG] Add SelectionDAG::MaxRecursionDepth constant
As commented on D67557 we have a lot of uses of depth checks all using magic numbers.

This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly.

Differential Revision: https://reviews.llvm.org/D67711

llvm-svn: 372315
2019-09-19 12:58:43 +00:00
Roman Lebedev c00f318224 [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`

This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266

Reviewers: efriedma, deadalnix

Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62392

llvm-svn: 372259
2019-09-18 20:48:27 +00:00
Philip Reames 079e210463 [SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later.  See D66309 for context.

Differential Revision: https://reviews.llvm.org/D66318

llvm-svn: 371786
2019-09-12 22:49:17 +00:00
Craig Topper efe6724b9f [DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares.
The X86 decision assumes the compare will produce a result in an XMM
register, but that can't happen for an fp128 compare since those
go to a libcall the returns an i32. Pass the VT so X86 can check
the type.

llvm-svn: 371775
2019-09-12 21:30:18 +00:00
Simon Pilgrim da59a6bf7d [DAGCombine] visitFDIV - Use isCheaperToUseNegatedFPOps helper for (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y). NFCI.
Minor cleanup to use equivalent helper code.

llvm-svn: 371724
2019-09-12 11:03:09 +00:00
Qiu Chaofan b7fb5d0f6f [DAGCombiner] Improve division estimation of floating points.
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication.  This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.

Reviewed By: Sanjay Patel, Jinsong Ji

Differential Revision: https://reviews.llvm.org/D66050

llvm-svn: 371713
2019-09-12 07:51:24 +00:00
Craig Topper 5ebd0a6e88 [SelectionDAG] Remove ISD::FP_ROUND_INREG
I don't think anything in tree creates this node. So all of this
code appears to be dead.

Code coverage agrees
http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html

Differential Revision: https://reviews.llvm.org/D67312

llvm-svn: 371431
2019-09-09 17:54:44 +00:00
Craig Topper dac34f52d3 [DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry.
I modified the ARM test to use two inputs instead of 0 so the
test hopefully still tests what was intended.

llvm-svn: 371344
2019-09-08 19:24:39 +00:00
Bjorn Pettersson 5e331e4ce8 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Sanjay Patel 4e54cf3e0e [DAGCombiner] try to form test+set out of shift+mask patterns
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.

This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.

Differential Revision: https://reviews.llvm.org/D66687

llvm-svn: 370668
2019-09-02 14:52:09 +00:00
Sanjay Patel c882208367 [DAGCombiner] improve throughput of shift+logic+shift
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
2019-09-01 18:38:15 +00:00
Sanjay Patel 9e57b49392 [DAGCombiner] clean up code in visitShiftByConstant()
This is not quite NFC because the SDLoc propagation is changed,
but there are no regression test diffs from that.

llvm-svn: 370587
2019-08-31 15:08:58 +00:00
Amaury Sechet 82825ab882 [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.
Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

llvm-svn: 370578
2019-08-31 11:40:02 +00:00
James Molloy e62c509cd4 [DAGCombiner] Don't create illegal narrow stores
Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.

No test as this is DAGCombiner and depends on target bits.

llvm-svn: 370576
2019-08-31 10:46:16 +00:00
Simon Pilgrim 3be7081aa1 [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.
SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent.

llvm-svn: 370498
2019-08-30 18:19:02 +00:00
Simon Pilgrim ab8cb1a3c5 [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.
llvm-svn: 370489
2019-08-30 17:21:20 +00:00
Simon Pilgrim c2fed1dc8a [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.
llvm-svn: 370478
2019-08-30 15:17:37 +00:00
Simon Pilgrim 3367669668 [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.
llvm-svn: 370471
2019-08-30 13:30:37 +00:00
Simon Pilgrim 8e1989e79a [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types.
This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.

llvm-svn: 370468
2019-08-30 12:22:06 +00:00
Simon Pilgrim 7cbf823f93 [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros
Return a proper zero vector, just in case some elements are undef.

Noticed by inspection after dealing with a similar issue in PR43159.

llvm-svn: 370460
2019-08-30 10:42:14 +00:00
Simon Pilgrim ea67741899 [DAGCombine] Fix shadow variable warnings. NFCI.
llvm-svn: 370365
2019-08-29 14:34:07 +00:00
Simon Pilgrim 6c2fc64edc Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 370333
2019-08-29 11:18:53 +00:00
Amaury Sechet 8365e42010 [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y)
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66718

llvm-svn: 370326
2019-08-29 10:35:51 +00:00
Simon Pilgrim 14e07d7f4b [DAGCombine] Fix cppcheck shadow variable warning. NFCI.
We already have an outer Ops variable.

llvm-svn: 370197
2019-08-28 12:48:41 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Simon Pilgrim c5b38e2869 [DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI.
These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors.

llvm-svn: 370189
2019-08-28 11:50:36 +00:00
Sanjay Patel b516f1afdd [DAGCombiner] cancel fnegs from multiplied operands of FMA
(-X) * (-Y) + Z --> X * Y + Z

This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.

We do handle the simpler case already:
(-X) * (-Y) --> X * Y

And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.

Differential Revision: https://reviews.llvm.org/D66755

llvm-svn: 370071
2019-08-27 15:17:46 +00:00
Amaury Sechet f28dee2cff [DAGCombiner] Add node to the worklist in topological order in parallelizeChainedStores
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66659

llvm-svn: 370056
2019-08-27 13:27:57 +00:00
Amaury Sechet a1e5ef3fd4 [DAGCombiner] Add node to the worklist in topological order after relegalization.
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66702

llvm-svn: 370040
2019-08-27 11:06:09 +00:00
Richard Trieu 58e67b8aa3 Revert r369927 - [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
This change causes instrumented builds of Clang to have a fatal error in the
backend.  https://reviews.llvm.org/D66537 has the details.

llvm-svn: 370006
2019-08-27 02:04:11 +00:00
Craig Topper 846429de74 [DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC
This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.

Differential Revision: https://reviews.llvm.org/D66680

llvm-svn: 369937
2019-08-26 17:59:11 +00:00
Amaury Sechet b7075e40f3 [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
Summary:
This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for.

DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66537

llvm-svn: 369927
2019-08-26 17:02:12 +00:00
Craig Topper b8b90ac1c5 [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.

I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66504

llvm-svn: 369872
2019-08-25 17:59:49 +00:00
Nikita Popov aa71c977ba [SDAG] Fold umul_lohi with 0 or 1 multiplicand
These can turn up during multiplication legalization. In principle
these should also apply to smul_lohi, but I wasn't able to figure
out how to produce those with the necessary operands.

Differential Revision: https://reviews.llvm.org/D66380

llvm-svn: 369864
2019-08-25 08:04:22 +00:00
Simon Pilgrim 04906ef1f2 [DAGCombine] GetNegatedExpression - add FMA\FMAD support
If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression.

Differential Revision: https://reviews.llvm.org/D63141

llvm-svn: 369746
2019-08-23 10:49:46 +00:00
Amaury Sechet 95cf66de7c [DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations
Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel

Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66548

llvm-svn: 369662
2019-08-22 15:35:45 +00:00
Amaury Sechet c0f190a048 [DAGCombiner] Remove mostly redundant calls to AddToWorklist
Summary:
These calls change the order in which some nodes are processed and so have an effect on codegen.

The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66543

llvm-svn: 369561
2019-08-21 18:51:08 +00:00
Amaury Sechet 045f33aec9 [DAGCombiner] Various nits. NFC
llvm-svn: 369520
2019-08-21 12:01:37 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Bjorn Pettersson 9dddd26e31 [DAGCombiner] Add simple folds for SMULFIX/UMULFIX/SMULFIXSAT
Summary:
Add the following DAGCombiner folds for mulfix being
one of SMULFIX/UMULFIX/SMULFIXSAT:
  (mulfix x, undef, scale) -> 0
  (mulfix x, 0, scale) -> 0

Also added canonicalization of constants to RHS.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66052

llvm-svn: 369103
2019-08-16 13:16:48 +00:00
Simon Pilgrim d4df81f463 Remove SmallBitVector.h include. NFCI.
SmallBitVector/BitVector types aren't used at all in the cpp file.

llvm-svn: 369008
2019-08-15 14:40:37 +00:00
Simon Pilgrim ed804dad1e [DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI.
Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types.

Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little.

llvm-svn: 368998
2019-08-15 13:07:14 +00:00
Sanjay Patel 26b2c11451 [DAGCombiner] exclude x*2.0 from normal negation profitability rules
This is the codegen part of fixing:
https://bugs.llvm.org/show_bug.cgi?id=32939

Even with the optimal/canonical IR that is ideally created by D65954,
we would reverse that transform in DAGCombiner and end up with the same
asm on AArch64 or x86.

I see 2 options for trying to correct this:

  1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch).
  2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case
     transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul()
     that matches the transform done by DAGCombiner.

This seems like the less intrusive patch, but if there's some other reason to
prefer 1 option over the other, we can change to the other option.

Differential Revision: https://reviews.llvm.org/D66016

llvm-svn: 368490
2019-08-09 21:37:32 +00:00
Sanjay Patel 0b4ae34c2f [DAGCombiner] remove redundant fold for X*1.0; NFC
This is handled at node creation time (similar to X/1.0)
after:
rL357029
(no fast-math-flags needed)

llvm-svn: 368443
2019-08-09 14:30:59 +00:00
Craig Topper 9158e54270 [SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer.
We may be able to look to how VSELECT is handled to further
improve this, but this appears to be neutral or an improvement
on the test cases we have.

llvm-svn: 368344
2019-08-08 21:14:08 +00:00
Cullen Rhodes ced419f4d7 [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.

Original modes:
 vector of bases
 base + vector of signed scaled offsets

New modes:
 base + vector of signed unscaled offsets
 base + vector of unsigned scaled offsets
 base + vector of unsigned unscaled offsets

The current behaviour of addressing modes for gather/scatter remains
unchanged.

Patch by Paul Walker.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D65636

llvm-svn: 368008
2019-08-06 09:46:13 +00:00
Sanjay Patel eaf13044bd [DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955

This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.

I combined the earlier legality check into the initial
clause to simplify the code.

So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.

llvm-svn: 367834
2019-08-05 11:27:07 +00:00
Craig Topper 2edeb8a11a [DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization.
This is further fix for PR42880.

Sanjay already disabled the X86 TLI hook for non-simple types,
but we should really call isTypeLegal here if we're after type
legalization.

llvm-svn: 367768
2019-08-03 23:09:13 +00:00
Sanjay Patel 68264558f9 [DAGCombiner] try to convert opposing shifts to casts
This reverses a questionable IR canonicalization when a truncate
is free:

sra (add (shl X, N1C), AddC), N1C -->
sext (add (trunc X to (width - N1C)), AddC')

https://rise4fun.com/Alive/slRC

More details in PR42644:
https://bugs.llvm.org/show_bug.cgi?id=42644

I limited this to pre-legalization for code simplicity because that
should be enough to reverse the IR patterns. I don't have any
evidence (no regression test diffs) that we need to try this later.

Differential Revision: https://reviews.llvm.org/D65607

llvm-svn: 367710
2019-08-02 19:33:46 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Michael Berg 005d705d43 Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper

Reviewed By: spatel

Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D65170

llvm-svn: 367486
2019-07-31 21:57:28 +00:00
Wei Mi f49c107f06 [DAGCombine] Limit the number of times for the same store and root nodes
to bail out in store merging dependence check.

We run into a case where dependence check in store merging bail out many times
for the same store and root nodes in a huge basicblock. That increases compile
time by almost 100x. The patch add a map to track how many times the bailing
out happen for the same store and root, and if it is over a limit, stop
considering the store with the same root as a merging candidate.

Differential Revision: https://reviews.llvm.org/D65174

llvm-svn: 367472
2019-07-31 19:59:24 +00:00
Wei Mi 888efda280 [DAGCombiner] Add an option to control whether or not to enable store merging.
Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.

Differential Revision: https://reviews.llvm.org/D65482

llvm-svn: 367365
2019-07-30 23:14:56 +00:00
Simon Pilgrim f8a7e9de06 [DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI.
This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious.

llvm-svn: 367220
2019-07-29 11:34:45 +00:00
Simon Pilgrim 76f2f04d9d [DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI.
If the subvector binop is illegal then early-out and avoid the subvector searches.

llvm-svn: 367181
2019-07-27 19:42:58 +00:00
Craig Topper a658cb0b12 [DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.

llvm-svn: 366779
2019-07-23 05:13:39 +00:00
Craig Topper f5247244f2 [DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI
llvm-svn: 366778
2019-07-23 05:13:35 +00:00
Simon Pilgrim 8b525e357f [DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI.
NFC step towards reusing this in other EXTRACT_SUBVECTOR combines.

llvm-svn: 366435
2019-07-18 13:45:53 +00:00
Amaury Sechet f34a69c2e2 [DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry.
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.

Depends on D57302

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59208

llvm-svn: 366214
2019-07-16 15:17:00 +00:00
Simon Pilgrim 701e2c0d71 [DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in helper. NFCI.
First step towards supporting 'free' subvector extractions other than concat_vectors.

llvm-svn: 365896
2019-07-12 13:00:35 +00:00
Simon Pilgrim d0307f93a7 [DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).

This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.

In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.

The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.

Differential Revision: https://reviews.llvm.org/D63653

llvm-svn: 365785
2019-07-11 14:45:03 +00:00
Michael Berg f4572249d7 Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang.  I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it.  Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. 

Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm

Reviewed By: arsenm

Subscribers: michele.scandale, wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D64450

llvm-svn: 365679
2019-07-10 18:23:26 +00:00
Simon Pilgrim 94c84aca5d [DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI.
Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math.

llvm-svn: 365621
2019-07-10 12:21:35 +00:00
Simon Pilgrim bb1167a3a1 Fix const/non-const lambda return type warning. NFCI.
llvm-svn: 365613
2019-07-10 10:45:09 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Simon Pilgrim 57603cbde8 [DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI.
Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math.

llvm-svn: 365504
2019-07-09 15:28:57 +00:00
Simon Pilgrim 9c68aa33e3 [DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. NFCI.
llvm-svn: 365269
2019-07-06 18:32:15 +00:00
Craig Topper e9aed963ce [DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo.
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.

Reviewers: spatel, RKSimon, deadalnix

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64190

llvm-svn: 365149
2019-07-04 18:18:46 +00:00
Amaury Sechet 57dfacb32d Use getAllOnesConstants instead of -1 in DAGCombiner. NFC
llvm-svn: 365054
2019-07-03 16:34:36 +00:00
Amaury Sechet bddb8c3597 [DAGCombine] More diamong carry pattern optimization.
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.

Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57302

llvm-svn: 365051
2019-07-03 16:15:59 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Zi Xuan Wu 7ae536a1ce [DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
For a given floating point load / store pair, if the load value isn't used by any other operations, 
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.

And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair 
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.

I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.

Differential Revision: https://reviews.llvm.org/D60601

llvm-svn: 364883
2019-07-02 02:54:52 +00:00
Simon Pilgrim a6319e5f83 [DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases  (e.g. extract_subvector(v2i64 bitcast(v16i8)))

llvm-svn: 364405
2019-06-26 11:17:38 +00:00
QingShan Zhang e0e7d4c366 Teach the DAGCombine to fold this pattern(c1 and c2 is constant).
// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
// fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2)
// fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e.

t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0>
t2: i64 = any_extend t1
 -->
t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0>
 -->
t4: i64 = sign_extend_inreg t3

Differential Revision: https://reviews.llvm.org/D63318

llvm-svn: 364382
2019-06-26 05:12:53 +00:00
Simon Pilgrim 9762b26032 [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal
Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ......

llvm-svn: 364326
2019-06-25 16:00:16 +00:00
Simon Pilgrim 69144a925e [DAGCombine] visitMUL - allow shift by zero in MulByConstant.
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.

Fixes OSS Fuzz #15429

llvm-svn: 364179
2019-06-24 12:47:17 +00:00
Craig Topper 6ddc7912b0 [SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store.
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.

The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.

llvm-svn: 364151
2019-06-23 07:00:46 +00:00
Simon Pilgrim 0da13ed1f6 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Simon Pilgrim ca9933c22d [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
Simon Pilgrim 801c0f12b0 [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.
Better handling of out-of-i64-range values due to large integer types or from fuzz tests.

llvm-svn: 363955
2019-06-20 17:36:23 +00:00
Jordan Rupprecht 02508decf4 [DAGCombiner][NFC] Remove unused var
llvm-svn: 363954
2019-06-20 17:30:01 +00:00
Simon Pilgrim 1d8093249f [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363929
2019-06-20 14:42:27 +00:00
Simon Pilgrim 98a0ac5c0f [DAGCombine] Add TODOs for some combines that should support non-uniform vectors
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.

llvm-svn: 363924
2019-06-20 12:48:49 +00:00
Simon Pilgrim a487628270 [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.
Fixes cppcheck warning.

Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here.

llvm-svn: 363921
2019-06-20 10:56:37 +00:00
Simon Pilgrim 046d49a8dc [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....

llvm-svn: 363887
2019-06-19 22:14:24 +00:00
Simon Pilgrim 9eed5d2f78 [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363793
2019-06-19 12:41:37 +00:00
Simon Pilgrim 8c49366c9b [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.

llvm-svn: 363792
2019-06-19 12:25:29 +00:00
Simon Pilgrim bb6b856183 [DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI.
llvm-svn: 363789
2019-06-19 11:31:26 +00:00
Simon Pilgrim d954a53633 [DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI.
We pre-extend, not post.

llvm-svn: 363787
2019-06-19 11:17:48 +00:00
Luis Marques 2e46312ffd [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
Some GEPs were not being split, presumably because that split would just be 
undone by the DAGCombiner. Not performing those splits can prevent important 
optimizations, such as preventing the element indices / member offsets from 
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP 
  are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

llvm-svn: 363544
2019-06-17 10:54:12 +00:00
Michael Berg ad6bb86b2d adding more fmf propagation for selects plus updated tests
llvm-svn: 363484
2019-06-15 04:53:51 +00:00
Fangrui Song 968b5f84af Revert "adding more fmf propagation for selects plus tests"
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

llvm-svn: 363482
2019-06-15 03:51:08 +00:00
Michael Berg 69394bedc5 adding more fmf propagation for selects plus tests
llvm-svn: 363474
2019-06-14 23:30:52 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Simon Pilgrim 266f43964e [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.

llvm-svn: 363048
2019-06-11 11:00:23 +00:00
Simon Pilgrim 287e78c82b [DAGCombine] GetNegatedExpression - constant float vector support (PR42105)
Add support for negation of constant build vectors.

Differential Revision: https://reviews.llvm.org/D62963

llvm-svn: 363040
2019-06-11 09:44:33 +00:00
QingShan Zhang ab846da7e8 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D62897

llvm-svn: 362921
2019-06-10 05:40:21 +00:00
Simon Pilgrim 5f337149fa Use for-range loop. NFCI.
llvm-svn: 362897
2019-06-09 09:07:30 +00:00
Simon Pilgrim 6bae6d5a5d [DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI.
Same codegen, only differ by the oneuse limit for the sextload case.

llvm-svn: 362880
2019-06-08 17:02:00 +00:00
Simon Pilgrim f0240ee76d [DAGCombine] visitAND - fix local shadow variable warnings. NFCI.
llvm-svn: 362825
2019-06-07 18:36:43 +00:00
Simon Pilgrim 4c9db2045a [DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI.
llvm-svn: 362820
2019-06-07 18:07:06 +00:00
Simon Pilgrim 842c7792aa [DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123)
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:

1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.

2 - The merged load\store node must retain the non-temporal flag.

Differential Revision: https://reviews.llvm.org/D62910

llvm-svn: 362723
2019-06-06 17:04:13 +00:00
Simon Pilgrim da993d08c8 [DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI.
Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s

llvm-svn: 362695
2019-06-06 10:21:18 +00:00
Simon Pilgrim 77d6adc491 Fix shadow local variable warning. NFCI.
llvm-svn: 362622
2019-06-05 17:26:29 +00:00
Nemanja Ivanovic aed7227b71 Revert r362472 as it is breaking PPC build bots
The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots.
Reverting it to bring the bots back to green.

llvm-svn: 362539
2019-06-04 18:48:43 +00:00
Craig Topper 09a4415803 [DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1)
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.

Fixes PR42118

Differential Revision: https://reviews.llvm.org/D62828

llvm-svn: 362533
2019-06-04 17:44:18 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Roman Lebedev 3dce0326fe [DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952)
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.

As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.

This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/vMd3

Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62774

llvm-svn: 362488
2019-06-04 11:06:21 +00:00
Roman Lebedev be6ce7b3f2 [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold
Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M

The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.

Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma

Reviewed By: efriedma

Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62266

llvm-svn: 362487
2019-06-04 11:06:08 +00:00
Simon Pilgrim 3018d505a3 [SelectionDAG] Add fpto[us]i(undef) --> undef constant fold
Follow up to D62807.

Differential Revision: https://reviews.llvm.org/D62811

llvm-svn: 362483
2019-06-04 10:04:55 +00:00
QingShan Zhang 11de0e71b0 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472
2019-06-04 08:53:53 +00:00
Michael Berg 0b7f98da65 Propagate fmf for setcc/select folds
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439
2019-06-03 19:12:15 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Florian Hahn e71963c850 Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350
2019-06-03 01:30:19 +00:00
Craig Topper 50b35caf30 [DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask.
Similar to what was done for masked load and gather.

llvm-svn: 362342
2019-06-02 22:52:38 +00:00
Craig Topper a7bc31ebc6 [DAGCombiner] Replace masked loads with a zero mask with the passthru value
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
2019-06-02 18:58:46 +00:00
Simon Pilgrim 7a869e7036 [DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.

Differential Revision: https://reviews.llvm.org/D59188

llvm-svn: 362324
2019-06-02 14:42:11 +00:00
Craig Topper f58ef87bb7 [DAGCombiner] Replace two unchecked dyn_casts with casts.
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.

I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.

llvm-svn: 362315
2019-06-02 03:31:01 +00:00
Roman Lebedev 46511d75b5 [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.

llvm-svn: 362156
2019-05-30 21:10:37 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Roman Lebedev 7eb8b5b5dd [DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold
Summary: https://rise4fun.com/Alive/B0A

Reviewers: t.p.northover, RKSimon, spatel, craig.topper

Reviewed By: RKSimon

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62691

llvm-svn: 362135
2019-05-30 19:27:51 +00:00
Roman Lebedev 691b5e2ecc [DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold
Summary: https://rise4fun.com/Alive/Mb1M

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62689

llvm-svn: 362134
2019-05-30 19:27:42 +00:00
Roman Lebedev 0a3dbbcdfb [DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62664

llvm-svn: 362133
2019-05-30 19:27:32 +00:00
Roman Lebedev 9ff3159b4a [DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62663

llvm-svn: 362132
2019-05-30 19:27:26 +00:00
Roman Lebedev cc9a9cf237 [DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold
Summary:
This was the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, spatel, craig.topper, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62662

llvm-svn: 362131
2019-05-30 19:27:19 +00:00
Roman Lebedev ef95679741 [DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62661

llvm-svn: 362130
2019-05-30 19:27:10 +00:00
Roman Lebedev 019d270e43 [DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.

Some cases may highlight missing constant-folds.

Reverts r361871,r361872,r361873,r361874.

llvm-svn: 362109
2019-05-30 16:07:11 +00:00
Benjamin Kramer 107f8d9873 [DAGCombiner] Replace gathers with a zero mask with the passthru value
These can be created by the legalizer when splitting a larger gather.

See https://llvm.org/PR42055 for a motivating example.

Differential Revision: https://reviews.llvm.org/D62613

llvm-svn: 362015
2019-05-29 19:24:19 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00
Roman Lebedev 6a24c9b9ab [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 361854
2019-05-28 17:54:04 +00:00
Roman Lebedev 1499f65ac1 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361853
2019-05-28 17:53:54 +00:00
Roman Lebedev 19f51ec04a [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361852
2019-05-28 17:53:43 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Sanjay Patel 7d6c0bce50 [DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.

llvm-svn: 361547
2019-05-23 20:17:25 +00:00
Sanjay Patel 78c3f58122 [DAGCombiner] prevent unsafe reassociation of FP ops
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.

This was noted as a potential risk (and that risk might increase) with:
D62191

llvm-svn: 361268
2019-05-21 14:47:38 +00:00
Craig Topper 203bfdd0f0 [DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC
This changes the isShift variable to include the constant operand
check that was previously in the if statement.

While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.

llvm-svn: 361168
2019-05-20 16:26:55 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Clement Courbet d9d0665d1c [[DAGCombiner][NFC] Add a comment.
As suggested in D61846.

llvm-svn: 360755
2019-05-15 08:21:18 +00:00
Sanjay Patel 99d6420a82 [SDAG] fix unused variable warning and unneeded indirection; NFC
llvm-svn: 360640
2019-05-14 00:57:31 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Sanjay Patel 05dafb1c97 [DAGCombiner] narrow vector binop with inserts/extract
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.

The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.

llvm-svn: 360585
2019-05-13 14:31:14 +00:00
Clement Courbet 9afc4764dd [DAGCombiner] Fix invalid alias analysis.
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().

I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.

Fixes PR41855.

Reviewers: niravd, gchatelet, EricWF

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61846

llvm-svn: 360566
2019-05-13 09:07:37 +00:00
Sanjay Patel a09e686821 [DAGCombiner] try to move bitcast after extract_subvector
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:

       t56: v8i32 = add t59, t58
      t68: v4i64 = bitcast t56
    t73: v2i64 = extract_subvector t68, Constant:i64<2>
  t96: v4i32 = bitcast t73

There are a few wins and neutral diffs in the other tests.

Differential Revision: https://reviews.llvm.org/D61806

llvm-svn: 360541
2019-05-12 14:43:20 +00:00
Jordan Rupprecht 16c7fbd112 Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
This reverts r360171 (git commit a9d6c32eaf). A repro showing the asan/msan failures is forthcoming.

llvm-svn: 360481
2019-05-10 23:20:02 +00:00
Sanjay Patel b37ddeafc0 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 360462
2019-05-10 20:02:30 +00:00
Cameron McInally 156eb28289 [CodeGen] Add comment about FSUB <-> FNEG xforms
Differential Revision: https://reviews.llvm.org/D61741

llvm-svn: 360366
2019-05-09 19:28:52 +00:00
Florian Hahn be10bc71f9 [DAGCombiner] Limit number of nodes explored as store candidates.
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61511

llvm-svn: 360357
2019-05-09 17:05:52 +00:00
QingShan Zhang e065af6a42 [NFC] Add a static function to do the endian check
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. 

Differential Revision: https://reviews.llvm.org/D61236

llvm-svn: 360226
2019-05-08 07:21:37 +00:00
Florian Hahn a9d6c32eaf [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61397

llvm-svn: 360171
2019-05-07 16:47:27 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Simon Pilgrim 5d3b100750 [DAGCombine] Remove repeated variables. NFCI.
llvm-svn: 359915
2019-05-03 18:20:28 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Simon Pilgrim ef54b1dddf [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.
Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format.

llvm-svn: 359326
2019-04-26 17:49:02 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
Sanjay Patel 06ff5eae5b [DAGCombiner] generalize binop-of-splats scalarization
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.

Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.

llvm-svn: 358984
2019-04-23 13:16:41 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Sanjay Patel 9bc6c77220 [DAGCombiner] make variable name less ambiguous; NFC
llvm-svn: 358886
2019-04-22 13:42:50 +00:00
Sanjay Patel d6989daae9 [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC
llvm-svn: 358884
2019-04-22 13:36:07 +00:00
Simon Pilgrim e7fe6dd5ed [DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well
The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant.

llvm-svn: 358585
2019-04-17 15:45:44 +00:00
Simon Pilgrim e5573f4f4e [TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359)
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.

shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask

preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair

llvm-svn: 358526
2019-04-16 20:57:28 +00:00
Luis Marques eda370d4c8 [DAGCombiner] Add missing flag to addressing mode check
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.

Differential Revision:  https://reviews.llvm.org/D60314

llvm-svn: 358502
2019-04-16 15:09:18 +00:00
Sanjay Patel 5e4ad39af7 [DAGCombiner] narrow shuffle of concatenated vectors
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291
2019-04-12 16:31:56 +00:00
Sanjay Patel fd314eca8f [DAGCombiner] refactor narrowing of extracted vector binop; NFC
There's a TODO comment about handling patterns with insert_subvector,
and we do want to match that.

llvm-svn: 358187
2019-04-11 15:59:47 +00:00
Sanjay Patel c0f4a35e68 [DAGCombiner][x86] scalarize inserted vector FP ops
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...

The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.

Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.

It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.

We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.

Differential Revision: https://reviews.llvm.org/D60514

llvm-svn: 358172
2019-04-11 14:21:57 +00:00
Craig Topper 61e77b11d1 [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.

Differential Revision: https://reviews.llvm.org/D60020

llvm-svn: 358027
2019-04-09 18:33:56 +00:00
Sanjay Patel 50a8652785 [DAGCombiner][x86] scalarize splatted vector FP ops
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Differential Revision: https://reviews.llvm.org/D60150

llvm-svn: 357760
2019-04-05 13:32:17 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Simon Pilgrim 8d248dbd77 [DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI.
Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance.

llvm-svn: 357602
2019-04-03 16:00:59 +00:00
Sanjay Patel 00dae6b22d [DAGCombiner] loosen restrictions for moving shuffles after vector binop
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580
2019-04-03 13:42:06 +00:00
Simon Pilgrim 02599de2e1 [DAGCombine] Don't use getZExtValue() until we know the constant is in range.
Noticed during prep for a patch for PR40758.

llvm-svn: 357571
2019-04-03 11:00:55 +00:00
Hans Wennborg 94b867dc7c Revert r357256 "[DAGCombine] Improve Lifetime node chains."
As it caused a pathological compile-time regressionin V8, see PR41352.

> Improve both start and end lifetime nodes chain dependencies.
>
> Reviewers: courbet
>
> Reviewed By: courbet
>
> Subscribers: hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D59795

This also reverts the follow-up r357309:

> [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
>
> Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357563
2019-04-03 07:41:58 +00:00
Sanjay Patel 7cb7daabbb [DAGCombiner] reduce code duplication; NFC
llvm-svn: 357498
2019-04-02 17:20:54 +00:00
Nirav Dave 54f7118de5 [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357309
2019-03-29 20:26:23 +00:00
Nirav Dave 7e84cacdbd [DAG] Avoid redundancy in StoreMerge TokenFactor generation.
Avoid generating redundant TokenFactor when all merged stores have
the same chain.

llvm-svn: 357299
2019-03-29 18:50:22 +00:00
Nirav Dave fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Nirav Dave 610036c506 [DAG] Set up infrastructure to avoid smart constructor-based dangling nodes
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.

Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.

Reviewers: efriedma, RKSimon, craig.topper, jyknight

Reviewed By: jyknight

Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58068

llvm-svn: 357279
2019-03-29 17:26:40 +00:00
Sanjay Patel 12685d0f7c [DAGCombiner] simplify shuffle of shuffle
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).

We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).

It looks like we miss this pattern in IR too.

In one of the zext examples here, we have shuffle masks like this:

Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>

...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.

Differential Revision: https://reviews.llvm.org/D59961

llvm-svn: 357258
2019-03-29 14:20:38 +00:00
Nirav Dave 9259de217e [DAGCombine] Improve Lifetime node chains.
Improve both start and end lifetime nodes chain dependencies.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59795

llvm-svn: 357256
2019-03-29 14:09:47 +00:00
Sanjay Patel 665a385035 [DAGCombiner] fold sext into decrement
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.

  %z = zext i8 %x to i32
  %dec = add i32 %z, -1
  %r = sext i32 %dec to i64
  =>
  %z2 = zext i8 %x to i64
  %r = add i64 %z2, -1

https://rise4fun.com/Alive/kPP

The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.

But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.

llvm-svn: 357254
2019-03-29 13:49:08 +00:00
Sanjay Patel ffa8d3def7 [DAGCombiner] fold sext into negation
As noted in D59818:
  %z = zext i8 %x to i32
  %neg = sub i32 0, %z
  %r = sext i32 %neg to i64
  =>
  %z2 = zext i8 %x to i64
  %r = sub i64 0, %z2

https://rise4fun.com/Alive/KzSR

llvm-svn: 357178
2019-03-28 15:46:02 +00:00
Simon Pilgrim 38a0616c1d [DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.

Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.

Differential Revision: https://reviews.llvm.org/D59654

llvm-svn: 357161
2019-03-28 11:34:21 +00:00
Nirav Dave 6b741a8038 [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes
Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations.

Reviewers: courbet, jyknight

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59897

llvm-svn: 357121
2019-03-27 20:37:08 +00:00
Nirav Dave c6dfaa0e83 Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."
This patch appears to trigger very large compile time increases in
halide builds.

llvm-svn: 357116
2019-03-27 19:54:41 +00:00
Nirav Dave b5630a2ab1 [DAGCombiner] Unify Lifetime and memory Op aliasing.
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.

This is mostly NFC.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59794

llvm-svn: 357070
2019-03-27 14:14:46 +00:00
Nirav Dave 96a264e053 [DAGCombine] Refactor GatherAllAliases. NFCI.
llvm-svn: 357069
2019-03-27 14:14:35 +00:00
Jonas Paulsson 38342a5185 [DAGCombiner] Don't allow addcarry if the carry producer is illegal.
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.

Patch by Ulrich Weigand.

Review: Sanjay Patel
https://reviews.llvm.org/D59822

llvm-svn: 357052
2019-03-27 08:41:46 +00:00
Nirav Dave a28c514581 [DAG] Avoid smart constructor-based dangling nodes.
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.

Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.

Many minor changes, mostly positive.

llvm-svn: 356996
2019-03-26 15:08:14 +00:00
Simon Pilgrim e24441aab0 [TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT
This helps us relax the extension of a lot of scalar elements before they are inserted into a vector.

Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions.

Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes.

Differential Revision: https://reviews.llvm.org/D59484

llvm-svn: 356989
2019-03-26 12:32:01 +00:00
Florian Hahn 71033f2987 [DAGCombiner] Use getTokenFactor in a few more cases.
SDNodes can only have 64k operands and for some inputs (e.g. large
number of stores), we can reach this limit when creating TokenFactor
nodes. This patch is a follow up to D56740 and updates a few more places
that potentially can create TokenFactors with too many operands.

Reviewers: efriedma, craig.topper, aemerson, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D59156

llvm-svn: 356668
2019-03-21 14:32:09 +00:00
Simon Pilgrim da4992bf8d [DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type
We were calling FoldSetCC with the compare operand type instead of the result type.

Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838)

llvm-svn: 356667
2019-03-21 14:07:18 +00:00
Simon Pilgrim 51f65171e9 Remove out of date comment. NFCI.
DAGCombiner::convertBuildVecZextToZext just requires the extractions to be sequential, they don't have to start from 0'th index.

llvm-svn: 356552
2019-03-20 12:24:15 +00:00
Justin Bogner b353d6887e [DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle
In r311255 we added a case where we split vectors whose elements are
all derived from the same input vector so that we could shuffle it
more efficiently. In doing so, createBuildVecShuffle was taught to
adjust for the fact that all indices would be based off of the first
vector when this happens, but it's possible for the code that checked
that to fire incorrectly if we happen to have a BUILD_VECTOR of
extracts from subvectors and don't hit this new optimization.

Instead of trying to detect if we've split the vector by checking if
we have extracts from the same base vector, we can just pass that
information into createBuildVecShuffle, avoiding the miscompile.

Differential Revision: https://reviews.llvm.org/D59507

llvm-svn: 356476
2019-03-19 16:52:00 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Nirav Dave 55c921f4bf [DAG] Cleanup unused node in SimplifySelectCC.
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.

Reviewers: spatel, RKSimon, craig.topper

Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57921

llvm-svn: 356382
2019-03-18 17:02:38 +00:00
Nikita Popov 9a4453592b [DAGCombine] Fold (x & ~y) | y patterns
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.

This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.

Differential Revision: https://reviews.llvm.org/D59174

llvm-svn: 356333
2019-03-17 15:45:38 +00:00
Simon Pilgrim 3b0a6c69ee [DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097)
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.

llvm-svn: 356327
2019-03-16 17:36:26 +00:00
Nirav Dave ee5183c796 [DAGCombiner] Fix Comment. NFC.
llvm-svn: 356069
2019-03-13 17:44:40 +00:00
Nirav Dave d6351340bb [DAGCombiner] If a TokenFactor would be merged into its user, consider the user later.
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.

Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.

CodeGen tests with non-reordering changes:

  X86/aligned-variadic.ll -- memory-based add folded into stored leaq
  value.

  X86/constant-combiners.ll -- Optimizes out overlap between stores.

  X86/pr40631_deadstore_elision -- folds constant byte store into
  preceding quad word constant store.

Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet

Reviewed By: courbet

Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59260

llvm-svn: 356068
2019-03-13 17:07:09 +00:00
Clement Courbet 3bb5d0bb9b Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack."
Always check candidates for hasOtherUses(), not only stores.

llvm-svn: 356050
2019-03-13 13:56:23 +00:00
Simon Pilgrim 9f0a5ca843 [DAGCombine] Pull out repeated demanded bitmask generation. NFCI.
llvm-svn: 355932
2019-03-12 15:58:28 +00:00
Nikita Popov aa7cfa75f9 [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Amaury Sechet a135fd5562 Remove redundant extractBooleanFlip argument. NFC
llvm-svn: 355794
2019-03-11 00:37:01 +00:00
Amaury Sechet b62642a115 Refactor isBooleanFlip into extractBooleanFlip so that users do not depend on the patern matched. NFC
llvm-svn: 355769
2019-03-09 02:51:52 +00:00
Amaury Sechet 782ac933b5 [DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a)
Summary: This pattern is sometime created after legalization.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58874

llvm-svn: 355716
2019-03-08 19:39:32 +00:00
Simon Pilgrim 04e8439f72 [DAGCombine] Merge visitSMULO+visitUMULO into visitMULO. NFCI.
llvm-svn: 355690
2019-03-08 11:41:18 +00:00
Simon Pilgrim c71d6d157f [DAGCombine] Merge visitSADDO+visitUADDO into visitADDO. NFCI.
llvm-svn: 355689
2019-03-08 11:30:33 +00:00
Simon Pilgrim 2c2e76a9e2 [DAGCombine] Merge visitSSUBO+visitUSUBO into visitSUBO. NFCI.
llvm-svn: 355688
2019-03-08 11:16:55 +00:00
Simon Pilgrim 9d6347cfc1 [DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.

Requested by @spatel on D59006

llvm-svn: 355533
2019-03-06 18:52:52 +00:00
Simon Pilgrim cdf95f8f07 [DAGCombiner] Enable UADDO/USUBO vector combine support
Differential Revision: https://reviews.llvm.org/D58965

llvm-svn: 355517
2019-03-06 16:11:03 +00:00
Simon Pilgrim 1bdc2d1874 [DAGCombiner] Add SADDO/SSUBO combine support
Basic constant handling folds, for both scalars and vectors

Differential Revision: https://reviews.llvm.org/D58967

llvm-svn: 355506
2019-03-06 14:22:21 +00:00
Simon Pilgrim 642f53d292 [DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442)
Differential Revision: https://reviews.llvm.org/D58968

llvm-svn: 355495
2019-03-06 11:04:21 +00:00
Craig Topper 509a8a3cf1 [DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag.
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality

Differential Revision: https://reviews.llvm.org/D58884

llvm-svn: 355324
2019-03-04 19:12:16 +00:00
Nirav Dave 582d46328c [DAG] Fix constant store folding to handle non-byte sizes.
Avoid crashes from zero-byte values due to sub-byte store sizes.

Reviewers: uabelho, courbet, rnk

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58626

llvm-svn: 354884
2019-02-26 15:02:32 +00:00
Andrea Di Biagio 4a1e59a6e0 Fix a sign compare warning breaking the -Werror build.
The warning was introduced at r354793.

llvm-svn: 354810
2019-02-25 19:33:58 +00:00
Simon Pilgrim 28441ac75f [DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold

Differential Revision: https://reviews.llvm.org/D58585

llvm-svn: 354793
2019-02-25 16:02:01 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Nirav Dave 46f939c118 Disable big-endian constant store merges from rL354676.
llvm-svn: 354677
2019-02-22 16:20:34 +00:00
Nirav Dave 44037d7a63 [DAGCombine] Fold overlapping constant stores
Fold a smaller constant store into larger constant stores immediately
preceeding it.

Reviewers: rnk, courbet

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58468

llvm-svn: 354676
2019-02-22 16:00:19 +00:00
Sanjay Patel ba5ee817e9 [DAGCombiner] prevent infinite looping by truncating 'and' (PR40793)
This fold can occur during legalization, so it can fight with promotion
to the larger type. It apparently takes a special sequence and subtarget
to avoid more basic simplifications that would hide the problem.

But there's a bigger question raised here: why does distributeTruncateThroughAnd()
even exist? It duplicates functionality from a more minimal pattern that we
already have. But getting rid of this function requires some preliminary steps.

https://bugs.llvm.org/show_bug.cgi?id=40793

llvm-svn: 354594
2019-02-21 16:01:48 +00:00
Nirav Dave 48cf37b55c [DAGCombine] Generalize Dead Store to overlapping stores.
Summary:
Remove stores that are immediately overwritten by larger
stores.

Reviewers: courbet, rnk

Reviewed By: rnk

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58467

llvm-svn: 354518
2019-02-20 21:07:50 +00:00
Clement Courbet 62b3b91ab2 Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores to stack."
This is an NFC.

llvm-svn: 354476
2019-02-20 15:45:58 +00:00
Clement Courbet 292291fb90 Revert r354244 "[DAGCombiner] Eliminate dead stores to stack."
Breaks some bots.

llvm-svn: 354245
2019-02-18 08:24:29 +00:00
Clement Courbet 57f34dbd3e [DAGCombiner] Eliminate dead stores to stack.
Summary:
A store to an object whose lifetime is about to end can be removed.

See PR40550 for motivation.

Reviewers: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57541

llvm-svn: 354244
2019-02-18 07:59:01 +00:00
Sanjay Patel 86fac11d5a [DAGCombiner] convert logic-of-setcc into bit magic (PR40611)
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
         and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
         setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq

https://rise4fun.com/Alive/XslKj

This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").

That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353859
2019-02-12 17:07:47 +00:00
Benjamin Kramer 711950c116 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Simon Pilgrim c5744d4d69 [DAG] Add optional AllowUndefs to isNullOrNullSplat
No change in default behaviour (AllowUndefs = false)

llvm-svn: 353646
2019-02-10 17:42:15 +00:00
Simon Pilgrim 5a82a788a2 [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Differential Revision: https://reviews.llvm.org/D58009

llvm-svn: 353645
2019-02-10 17:04:00 +00:00
Nemanja Ivanovic 92a8c36735 [DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557
2019-02-08 19:50:58 +00:00
Simon Pilgrim 478bb90779 [TargetLowering] Add SimplifyDemandedBits funnel shift support
llvm-svn: 353539
2019-02-08 17:19:01 +00:00
Nirav Dave 97011ccce0 Revert r353416 "[DAG] Cleanup unused nodes on failed store-to-load forward combine."
This cleanup causes out-of-tree crashes.

llvm-svn: 353527
2019-02-08 15:21:13 +00:00
Simon Pilgrim fe3ac70b18 [DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111)
Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner

First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111.

Differential Revision: https://reviews.llvm.org/D57754

llvm-svn: 353457
2019-02-07 20:14:43 +00:00
Nirav Dave 9332fc2e19 Revert "[DAG] Cleanup of unused node in SimplifySelectCC."
Causes ASAN use-after-poison errors.

llvm-svn: 353442
2019-02-07 18:31:05 +00:00
Sanjay Patel 2d4b186844 [DAGCombiner] fold add/sub with bool operand based on target's boolean contents
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.

The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.

This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.

Differential Revision: https://reviews.llvm.org/D57401

llvm-svn: 353433
2019-02-07 17:43:34 +00:00
Nirav Dave 24e60819f6 [DAG] Cleanup of unused node in SimplifySelectCC.
llvm-svn: 353428
2019-02-07 17:13:55 +00:00
Nirav Dave 4b12236f7d [DAG] Cleanup unused node on failed SELECT Combine.
llvm-svn: 353426
2019-02-07 16:57:50 +00:00
Nirav Dave 724b81087d [DAG] Cleanup unused nodes on failed store-to-load forward combine.
llvm-svn: 353416
2019-02-07 15:38:14 +00:00
Nirav Dave b3506bf985 [DAG] Immediately cleanup unused nodes from extend-based combines.
llvm-svn: 353338
2019-02-06 20:12:03 +00:00
Clement Courbet 5a6712b633 [DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode.
GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with
static typing instead of runtime cast.

llvm-svn: 353291
2019-02-06 12:36:17 +00:00
Craig Topper d4e37afe45 [DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.

This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.

This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.

I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57616

llvm-svn: 353124
2019-02-05 00:22:23 +00:00
Simon Pilgrim a536b89fe0 [DAGCombine] Add ADD(SUB,SUB) combines
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.

We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.

llvm-svn: 353044
2019-02-04 13:44:49 +00:00
Simon Pilgrim bd42f97946 [SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI.
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......

I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.

llvm-svn: 352961
2019-02-02 17:35:06 +00:00
Guozhi Wei 0bed9e0453 [DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT
This patch fixes pr39098.

For the attached test case, CombineZExtLogicopShiftLoad can optimize it to

t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
  t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
    t58: i64 = srl t57, Constant:i8<1>
  t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64

But later visitANDLike transforms it to

t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
  t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
      t61: i32 = truncate t57
    t63: i32 = srl t61, Constant:i8<1>
  t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
    t58: i64 = srl t57, Constant:i8<1>
  t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64

And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.

Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.

Differential Revision: https://reviews.llvm.org/D57491

llvm-svn: 352792
2019-01-31 20:46:42 +00:00
Nirav Dave 4061b44057 [DAG] Aggressively cleanup dangling node in CombineZExtLogicopShiftLoad.
While dangling nodes will eventually be pruned when they are
considered, leaving them disables combines requiring single-use.

Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D57520

llvm-svn: 352784
2019-01-31 19:35:14 +00:00
Sanjay Patel 9ab23101a8 [DAGCombiner] sub X, 0/1 --> add X, 0/-1
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.

This pattern could regress with the proposed change in D57401.

llvm-svn: 352680
2019-01-30 22:41:35 +00:00
Sanjay Patel a61d586f74 [DAGCombiner] fold extract_subvector of extract_subvector
This is the sibling fold for insert-of-insert that was added with D56604.

Now that we have x86 shuffle narrowing (D57156), this change shows improvements for 
lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise).

There's a small regression in some of the partial-permute tests (extracting followed by splat).
That is tracked by PR40500:
https://bugs.llvm.org/show_bug.cgi?id=40500

Differential Revision: https://reviews.llvm.org/D57336

llvm-svn: 352528
2019-01-29 19:13:39 +00:00
Michael Berg 685d5f675e [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target control
llvm-svn: 352396
2019-01-28 18:03:08 +00:00
Sam Parker 9a2a89d58f [DAGCombine] Enable more pre-indexed stores
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.

Differential Revision: https://reviews.llvm.org/D56719

llvm-svn: 351933
2019-01-23 09:11:49 +00:00
Sanjay Patel effee52c59 [DAGCombiner] narrow vector binop with 2 insert subvector operands
vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z

This is another step in generic vector narrowing. It's also a step towards more horizontal op 
formation specifically for x86 (although we still failed to match those in the affected tests).

The scalarization cases are also not optimal (we should be scalarizing those), but it's still 
an improvement to use a narrower vector op when we know part of the result must be constant 
because both inputs are undef in some vector lanes.

I think a similar match but checking for a constant operand might help some of the cases in 
D51553.

Differential Revision: https://reviews.llvm.org/D56875

llvm-svn: 351825
2019-01-22 14:24:13 +00:00
Sanjay Patel e713c47d49 [DAGCombiner] fix crash when converting build vector to shuffle
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern 
there too, so we could avoid the back-and-forth using a shuffle.

llvm-svn: 351753
2019-01-21 17:30:14 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Florian Hahn dc4e154720 [SelectionDAG] Split very large token factors for chained stores to 64k chunks.
Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.

No test case, as it would be very IR file.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D56740

llvm-svn: 351571
2019-01-18 18:37:38 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
Sanjay Patel fad5bdaf95 [DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are 
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, 
we're avoiding the vector-scalar-vector moves in favor of vector ops.

We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.

Differential Revision: https://reviews.llvm.org/D56281

llvm-svn: 351198
2019-01-15 16:11:05 +00:00
Simon Pilgrim a1bd4a6ba4 [DAGCombiner] Add (sub_sat x, x) -> 0 combine
llvm-svn: 351073
2019-01-14 15:43:34 +00:00
Simon Pilgrim fa1f518748 [DAGCombiner] Enable sub saturation constant folding
llvm-svn: 351072
2019-01-14 15:28:53 +00:00
Simon Pilgrim 7fc6882374 [DAGCombiner] Add add/sub saturation undef handling
Match ConstantFolding.cpp:
(add_sat x, undef) -> -1
(sub_sat x, undef) -> 0

llvm-svn: 351070
2019-01-14 14:16:24 +00:00
Simon Pilgrim cfa5f06dde [DAGCombiner] Enable add saturation constant folding
llvm-svn: 351060
2019-01-14 12:34:31 +00:00
Simon Pilgrim 67610926fc [DAGCombiner] Add add saturation constant folding tests.
Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now.

llvm-svn: 351057
2019-01-14 12:12:42 +00:00
Simon Pilgrim 56ba1db933 [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)
NOTE: We need more powerful signed overflow detection in computeOverflowKind
llvm-svn: 351026
2019-01-13 22:08:26 +00:00
Simon Pilgrim 888fa8680c Fix unused variable warning. NFCI.
llvm-svn: 351025
2019-01-13 21:53:12 +00:00
Simon Pilgrim 897d4c6fe9 [DAGCombiner] Some very basic add/sub saturation combines.
Handle combines with zero and constant canonicalization for adds.

llvm-svn: 351024
2019-01-13 21:50:24 +00:00
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
Sanjay Patel 9b368f39a9 [DAGCombiner] simplify code; NFC
llvm-svn: 350844
2019-01-10 16:47:42 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Craig Topper 8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Craig Topper 802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Sanjay Patel 93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel 9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Sanjay Patel 4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel 47a6129e26 [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
llvm-svn: 349958
2018-12-21 21:26:30 +00:00
Simon Pilgrim 911dce2f30 [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349907
2018-12-21 14:56:18 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Craig Topper bd788ce5db [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
llvm-svn: 349726
2018-12-20 05:28:06 +00:00
Simon Pilgrim 2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim 6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Sanjay Patel f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim 0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Craig Topper 257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Sanjay Patel 093ab45d4c [DAGCombiner] clean up visitEXTRACT_VECTOR_ELT
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.

There's also a large block at the end that is dedicated 
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see. 

It's still not clear to me when we make certain transforms 
because the legality and constant conditions are 
intertwined in a way that might be improved.

llvm-svn: 349095
2018-12-14 00:09:08 +00:00
Sanjay Patel 791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Sanjay Patel c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Sanjay Patel a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Simon Pilgrim c73a955370 [DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call.
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.

llvm-svn: 348781
2018-12-10 18:18:50 +00:00
Francis Visoiu Mistrih 753efe3584 [DAGCombiner] Use the result value type in visitCONCAT_VECTORS
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.

With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
  0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
    0x7ff19d000b50: f64 = Register %1
In function: d

Differential Revision: https://reviews.llvm.org/D55507

llvm-svn: 348759
2018-12-10 14:31:34 +00:00
Sanjay Patel e767bf4468 [DAGCombiner] re-enable truncation of binops
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.

The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)

Original commit message for r347917:

The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.

llvm-svn: 348706
2018-12-08 16:07:38 +00:00
Sanjay Patel bc47ff86fe [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC
This duplicates several shared checks, but we need to split
this up to fix underlying bugs in smaller steps.

llvm-svn: 348627
2018-12-07 18:51:08 +00:00
Sanjay Patel 3af4ae9735 [DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively 
reverting r347917 and its follow-up r348195 while we
investigate the bug.

llvm-svn: 348604
2018-12-07 15:47:52 +00:00
Sanjay Patel bb796cd61c [DAGCombiner] remove explicit calls to AddToWorkList; NFCI
As noted in the post-commit thread for rL347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html
...we don't need to repeat these calls because the combiner does it automatically.

llvm-svn: 348597
2018-12-07 15:00:56 +00:00
Sanjay Patel c6441c8547 [DAGCombiner] use root SDLoc for all nodes created by logic fold
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.

I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.

llvm-svn: 348552
2018-12-07 00:01:57 +00:00
Sanjay Patel 86cb679851 [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).

This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.

llvm-svn: 348550
2018-12-06 23:53:58 +00:00
Sanjay Patel 276cef343c [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC
This code can still misbehave.

llvm-svn: 348547
2018-12-06 23:39:28 +00:00
Sanjay Patel 70af85b0ac [DAGCombiner] don't group bswap with casts in logic hoisting fold
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions. 

Earlier commits in this series:
rL348501
rL348508
rL348518

llvm-svn: 348534
2018-12-06 22:10:44 +00:00
Sanjay Patel 03a3ef2a0c [DAGCombiner] reduce indent; NFC
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.

llvm-svn: 348523
2018-12-06 20:02:47 +00:00
Andrea Di Biagio 52a2bac583 [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
    --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.

Differential Revision: https://reviews.llvm.org/D55274

llvm-svn: 348522
2018-12-06 19:55:38 +00:00
Sanjay Patel bfc7ffa40f [DAGCombiner] don't hoist logic op if operands have other uses, part 2
The PPC test with 2 extra uses seems clearly better by avoiding this transform. 
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.

llvm-svn: 348518
2018-12-06 19:18:56 +00:00
Sanjay Patel c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel e9bf78fa23 [DAGCombiner] refactor function that hoists bitwise logic; NFCI
Added FIXME and TODO comments for lack of safety checks.
This function is a suspect in out-of-memory errors as discussed in
the follow-up thread to r347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

llvm-svn: 348501
2018-12-06 17:08:03 +00:00
Simon Pilgrim 105a366254 DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI.
llvm-svn: 348494
2018-12-06 15:39:25 +00:00
Sanjay Patel 33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Simon Pilgrim 666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Sanjay Patel 2daceedf92 [DAGCombiner] guard against an oversized shift crash
This change prevents the crash noted in the post-commit comments 
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html

We can't guarantee that an oversized shift amount is folded away, 
so we have to check for it.

Note that I committed an incomplete fix for that crash with:
rL347502

But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.

So I'm not sure how to expose the bug now (and apparently no fuzzers have found 
a way yet either).

On the plus side, we have discovered that we're missing real optimizations by 
not simplifying nodes sooner, so the earlier fix still has value, and there's 
likely more value in extending that so we can simplify more opcodes and simplify 
when doing RAUW and/or putting nodes on the combiner worklist.

Differential Revision: https://reviews.llvm.org/D54954

llvm-svn: 348089
2018-12-02 13:33:56 +00:00
Sanjay Patel 8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Sanjay Patel 7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Sanjay Patel 04435677d0 [SelectionDAG] move constant or splat functions to common location
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.

llvm-svn: 347523
2018-11-25 16:09:32 +00:00
Sanjay Patel 7e119c0400 [DAG] consolidate shift simplifications
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.

The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.

llvm-svn: 347502
2018-11-23 20:05:12 +00:00
Sanjay Patel 3e80019275 [DAGCombiner] form 'not' ops ahead of shifts (PR39657)
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was 
reversing that transform. I think we need this transform in the backend 
regardless of what happens in IR to catch cases where the shift-xor 
is formed late from GEP or other ops.

https://rise4fun.com/Alive/NC1

  Name: shl
  Pre: (-1 << C2) == C1
  %shl = shl i8 %x, C2
  %r = xor i8 %shl, C1
  =>
  %not = xor i8 %x, -1
  %r = shl i8 %not, C2
  
  Name: shr
  Pre: (-1 u>> C2) == C1
  %sh = lshr i8 %x, C2
  %r = xor i8 %sh, C1
  =>
  %not = xor i8 %x, -1
  %r = lshr i8 %not, C2

https://bugs.llvm.org/show_bug.cgi?id=39657

llvm-svn: 347478
2018-11-22 19:24:10 +00:00
Sanjay Patel 20935e0ab5 [DAGCombiner] refactor select-of-FP-constants transform
This transform needs to be limited. 

We are converting to a constant pool load very early, and we 
are turning loads that are independent of the select condition 
(and therefore speculatable) into a dependent non-speculatable 
load.

We may also be transferring a condition code from an FP register
to integer to create that dependent load.

llvm-svn: 347424
2018-11-21 20:54:47 +00:00
Sanjay Patel 1c74747478 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 347410
2018-11-21 20:00:32 +00:00
Sanjay Patel 357053f289 [DAGCombiner] look through bitcasts when trying to narrow vector binops
This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).

The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.

The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.

Differential Revision: https://reviews.llvm.org/D54392

llvm-svn: 347356
2018-11-20 22:26:35 +00:00
Simon Pilgrim 3735105961 [DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989)
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.

llvm-svn: 347313
2018-11-20 15:23:50 +00:00
Sanjay Patel a36c444471 [DAGCombiner] reduce code duplication in visitXOR; NFC
llvm-svn: 347278
2018-11-20 00:51:45 +00:00
Simon Pilgrim 740122fb8c [DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207)
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.

Differential Revision: https://reviews.llvm.org/D53478

llvm-svn: 347255
2018-11-19 19:37:59 +00:00
Sanjay Patel b25adf5edb [SelectionDAG] simplify vector select with undef operand(s)
llvm-svn: 347227
2018-11-19 17:06:05 +00:00
Sanjay Patel c036d844be [SelectionDAG] add simplifySelect() to reduce code duplication; NFC
This should be extended to handle FP and vectors in follow-up patches.

llvm-svn: 347210
2018-11-19 14:35:22 +00:00
Sanjay Patel 8c0cd77bff [DAG] add undef simplifications for select nodes
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170
2018-11-18 17:36:23 +00:00
Stanislav Mekhanoshin 0ff7c8309d DAG combiner: fold (select, C, X, undef) -> X
Differential Revision: https://reviews.llvm.org/D54646

llvm-svn: 347110
2018-11-16 23:13:38 +00:00
Sam Parker ab99cfab21 [DAGCombine] Fix non-deterministic debug output
PR37970 reported non-deterministic debug output, this was caused by
iterating through a set and not a a vector.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970

Differential Revision: https://reviews.llvm.org/D54570

llvm-svn: 347037
2018-11-16 08:35:19 +00:00
Craig Topper 0b33b468a1 [DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops
It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner.

Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all.

Differential Revision: https://reviews.llvm.org/D54285

llvm-svn: 346728
2018-11-13 01:59:32 +00:00
Nirav Dave a395e2df56 [DAGCombiner] Fix load-store forwarding of indexed loads.
Summary:
Handle extra output from index loads in cases where we wish to
forward a load value directly from a preceeding store.

Fixes PR39571.

Reviewers: peter.smith, rengolin

Subscribers: javed.absar, hiraditya, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54265

llvm-svn: 346654
2018-11-12 14:05:40 +00:00
Craig Topper d23cdbbeb2 [DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC
Removes the need to call getNode internally and to recreate an SDValue after the call.

llvm-svn: 346600
2018-11-10 23:46:03 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Craig Topper 9a7e19b8f2 [DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars
It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input.

I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used.

Differential Revision: https://reviews.llvm.org/D54283

llvm-svn: 346530
2018-11-09 18:04:34 +00:00
Alexandros Lamprineas e15c982f6d [SelectionDAG] swap select_cc operands to enable folding
The DAGCombiner tries to SimplifySelectCC as follows:

  select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4)

It can't cope with the situation of reordered operands:

  select_cc(x, y, 0, 16, cc)

In that case we just need to swap the operands and invert the Condition Code:

  select_cc(x, y, 16, 0, ~cc)

Differential Revision: https://reviews.llvm.org/D53236

llvm-svn: 346484
2018-11-09 11:09:40 +00:00
Nirav Dave 6ce9f72f76 [DAGCombine] Improve alias analysis for chain of independent stores.
FindBetterNeighborChains simulateanously improves the chain
dependencies of a chain of related stores avoiding the generation of
extra token factors. For chains longer than the GatherAllAliasDepths,
stores further down in the chain will necessarily fail, a potentially
significant waste and preventing otherwise trivial parallelization.

This patch directly parallelize the chains of stores before improving
each store. This generally improves DAG-level parallelism.

Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk

Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53552

llvm-svn: 346432
2018-11-08 19:14:20 +00:00
Craig Topper 8f2f2a76b9 [DAGCombiner] Use tryFoldToZero to simplify some code and make it work correctly between LegalTypes and LegalOperations.
The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this.

llvm-svn: 346119
2018-11-05 05:53:06 +00:00
Craig Topper 8d64abddd1 [DAGCombiner] Remove an unused argument from tryFoldToZero. NFC
llvm-svn: 346118
2018-11-05 05:53:03 +00:00
Craig Topper 3292ea03d3 [DAGCombiner] Remove 'else' after return. NFC
This makes this code consistent with the nearly identical code in visitZERO_EXTEND.

llvm-svn: 346090
2018-11-04 06:56:32 +00:00
Craig Topper 1ba86188cf [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode.
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.

The rest of the patch is just changing all callers to use getNode directly.

llvm-svn: 346087
2018-11-04 02:10:18 +00:00
Craig Topper 60c202a494 [X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.

I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.

I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.

For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.

Differential Revision: https://reviews.llvm.org/D54024

llvm-svn: 346043
2018-11-02 21:09:49 +00:00
Simon Pilgrim cdcbeb4997 [DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732)
reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this.

While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing.

Differential Revision: https://reviews.llvm.org/D53712

llvm-svn: 345964
2018-11-02 11:06:18 +00:00
Craig Topper e2483020f2 [DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME.
I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover.

llvm-svn: 345908
2018-11-01 23:21:45 +00:00
Sanjay Patel c5fe3ce2ec [DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511)
The test causes a crash because we were trying to extract v4f32 to v3f32, and the
narrowing factor was then 4/3 = 1 producing a bogus narrow type.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=39511

llvm-svn: 345842
2018-11-01 15:41:12 +00:00
David Bolvansky d0080c3a5f [DAGCombiner] Fold 0 div/rem X to 0
Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover

Reviewed By: RKSimon

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D52504

llvm-svn: 345721
2018-10-31 14:18:57 +00:00
Bjorn Pettersson fe09a20f09 [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad
Summary:
Normalize the offset for endianess before checking
if the store cover the load in ForwardStoreValueToDirectLoad.

Without this we missed out on some optimizations for big
endian targets. If for example having a 4 bytes store followed
by a 1 byte load, loading the least significant byte from the
store, the STCoversLD check would fail (see @test4 in
test/CodeGen/AArch64/load-store-forwarding.ll).

This patch also fixes a problem seen in an out-of-tree target.
The target has i40 as a legal type, it is big endian,
and the StoreSize for i40 is 48 bits. So when normalizing
the offset for endianess we need to take the StoreSize into
account (assuming that padding added when storing into
a larger StoreSize always is added at the most significant
end).

Reviewers: niravd

Reviewed By: niravd

Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho

Differential Revision: https://reviews.llvm.org/D53776

llvm-svn: 345636
2018-10-30 20:16:39 +00:00
Sanjay Patel 8b207defea [DAGCombiner] narrow vector binops when extraction is cheap
Narrowing vector binops came up in the demanded bits discussion in D52912.

I don't think we're going to be able to do this transform in IR as a canonicalization 
because of the risk of creating unsupported widths for vector ops, but we already have 
a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is 
currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression 
test diffs).

This is artificially limited to not look through bitcasts because there are so many 
test diffs already, but that's marked with a TODO and is a small follow-up.

Differential Revision: https://reviews.llvm.org/D53784

llvm-svn: 345602
2018-10-30 14:14:34 +00:00
David Bolvansky dfdbb038e8 [DAGCombiner] Improve X div/rem Y fold if single bit element type
Summary: Tests by @spatel, thanks

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: sdardis, atanasyan, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D52668

llvm-svn: 345575
2018-10-30 09:07:22 +00:00
Craig Topper c4b785ae1e [DAGCombiner] Better constant vector support for FCOPYSIGN.
Enable constant folding when both operands are vectors of constants.

Turn into FNEG/FABS when the RHS is a splat constant vector.

llvm-svn: 345469
2018-10-28 01:32:49 +00:00
Sanjay Patel 0eddd4730f [DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC
We can extend this code to handle many more cases 
if an extract is cheap, so prepping for that change.

llvm-svn: 345430
2018-10-26 21:32:04 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Thomas Lively 43bc46207a [SelectionDAG] DAG combiner for fminnan and fmaxnan
Summary: Depends on D52765.

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52768

llvm-svn: 345210
2018-10-24 22:18:54 +00:00
Tim Northover 05fe8f918b [DAG] check more operands for cycles when merging stores.
Until now, we've only checked whether merging stores would cause a cycle via
the value argument, but the address and indexed offset arguments are also
capable of creating cycles in some situations.

The addresses are all base+offset with notionally the same base, but the base
SDNode may still be different (e.g. via an indexed load in one case, and an
ISD::ADD elsewhere). This allows cycles to creep in if one of these sources
depends on another.

The indexed offset is usually undef (representing a non-indexed store), but on
some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value,
again allowing dependency cycles to creep in.

llvm-svn: 345200
2018-10-24 21:36:34 +00:00
Matthias Braun 4f82406c46 SelectionDAG: Reuse bigger sized constants in memset expansion.
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...

We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.

- Ideally this would be handled by the ConstantHoist pass but it runs
  too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
  however SelectionDAG constantfolding would constantfold the
  "trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
  stop DAGCombiner from constant folding in this situation. (Similar to
  how ConstantHoisting marks things as Opaque to avoid folding
  ADD/SUB/etc.)

Differential Revision: https://reviews.llvm.org/D53181

llvm-svn: 345102
2018-10-23 23:19:23 +00:00
Craig Topper c8e183f9ee Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344965
2018-10-22 22:14:05 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Sanjay Patel e439cc2745 [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)
This is a late backend subset of the IR transform added with:
D52439

We can confirm that the conversion to a 'trunc' is correct by running:
$ opt -instcombine -data-layout="e"
(assuming the IR transforms are correct; change "e" to "E" for big-endian)

As discussed in PR39016:
https://bugs.llvm.org/show_bug.cgi?id=39016
...the pattern may emerge during legalization, so that's we are waiting for an 
insertelement to become a scalar_to_vector in the pattern matching here.

The DAG allows for fun variations that are not possible in IR. Result types for 
extracts and scalar_to_vector don't necessarily match input types, so that means 
we have to be a bit more careful in the transform (see code comments).

The tests show that we don't handle cases that require a shift (as we did in the
IR version). I've left that as a potential follow-up because I'm not sure if 
that's a real concern at this late stage.

Differential Revision: https://reviews.llvm.org/D53201

llvm-svn: 344872
2018-10-21 20:13:29 +00:00
Sanjay Patel 8bd74785f0 [DAGCombiner] allow undef elts in vector fmul matching
llvm-svn: 344534
2018-10-15 16:54:07 +00:00
Sanjay Patel 89e2197c33 [DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI
The transform doesn't work if the vector constant has undef elements.

llvm-svn: 344532
2018-10-15 16:47:01 +00:00
Sanjay Patel 9e7e0fd828 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344528
2018-10-15 15:56:39 +00:00
Sanjay Patel 4e970ff022 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344525
2018-10-15 15:38:38 +00:00
Sanjay Patel 56b6660d2e [DAGCombiner] rearrange extract_element+bitcast fold; NFC
I want to add another pattern here that includes scalar_to_vector,
so this makes that patch smaller. I was hoping to remove the
hasOneUse() check because it shouldn't be necessary for common
codegen, but an AMDGPU test has a comment suggesting that the
extra check makes things better on one of those targets.

llvm-svn: 344320
2018-10-11 23:56:56 +00:00
Nirav Dave f1f2a2a31a [DAG] Fix Big Endian in Load-Store forwarding
Summary:
Correct offset calculation in load-store forwarding for big-endian
targets.

Reviewers: rnk, RKSimon, waltl

Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53147

llvm-svn: 344272
2018-10-11 18:28:59 +00:00
Sanjay Patel 4875662e57 [DAGCombiner] move comment closer to the corresponding code; NFC
llvm-svn: 344255
2018-10-11 16:07:25 +00:00
Nirav Dave 07acc992dc [DAGCombine] Improve Load-Store Forwarding
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.

Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.

Reviewers: RKSimon, rnk, kparzysz, javed.absar

Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D49200

llvm-svn: 344142
2018-10-10 14:15:52 +00:00
Nemanja Ivanovic 72d4866e57 [DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X

When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)

As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.

Differential revision: https://reviews.llvm.org/D44548

llvm-svn: 344093
2018-10-09 23:20:11 +00:00
Sanjay Patel b64c0d7b53 [DAGCombiner] simplify code for fmul with constant fold; NFCI
llvm-svn: 343997
2018-10-08 21:17:20 +00:00
Sanjay Patel ecc8af61e7 [DAGCombiner] allow undef elts in vector fadd matching
llvm-svn: 343945
2018-10-07 16:30:42 +00:00
Sanjay Patel ef76e27985 [DAGCombiner] allow undefs when matching vector splats for fmul folds
llvm-svn: 343942
2018-10-07 16:05:37 +00:00
Sanjay Patel 0b74c840dd [DAGCombiner] allow undef elts in vector fabs/fneg matching
This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().

llvm-svn: 343940
2018-10-07 15:32:06 +00:00
Sanjay Patel 46a9dc2e3e [DAGCombiner] shorten code for bitcast+fabs fold; NFC
llvm-svn: 343939
2018-10-07 15:18:30 +00:00
Sanjay Patel f6a160a102 [SelectionDAG] allow undefs when matching splat constants
And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that 
patch simpler.

llvm-svn: 343865
2018-10-05 17:42:19 +00:00
Matthias Braun 004fe6bf83 DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types
This fixes a case of bad index calculation when merging mismatching
vector types. This changes the existing code to just use the existing
extract_{subvector|element} and a bitcast (instead of bitcast first and
then newly created extract_xxx) so we don't need to adjust any indices
in the first place.

rdar://44584718

Differential Revision: https://reviews.llvm.org/D52681

llvm-svn: 343493
2018-10-01 16:25:50 +00:00
Simon Pilgrim 818cfc40ff [DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalization
The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true

No test case to hand, noticed when investigating PR38226 + PR38970

llvm-svn: 343405
2018-09-30 12:46:42 +00:00
David Bolvansky 8e90bad63d [DAGCombiner] [NFC] Improve X div/rem 1 fold
Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52661

llvm-svn: 343349
2018-09-28 18:40:30 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Craig Topper b2a00acb24 [DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike returning a UDIVREM or SDIVREM node.
This shouldn't be possible and is a leftover from when we used to recursively call combine here.

llvm-svn: 343049
2018-09-25 23:52:07 +00:00
Nirav Dave f445a67be4 [DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.
Reuse search space bookkeeping across multiple predecessor checks
qdone to avoid redundancy. This should cut search cost by ~4x.

llvm-svn: 342984
2018-09-25 15:29:30 +00:00
Nirav Dave 7373d5e646 [DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. NFCI.
llvm-svn: 342983
2018-09-25 15:29:04 +00:00
Nirav Dave 46ab89a0d0 [DAGCombine] Don't fold dependent loads across SELECT_CC.
DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC
after the select, resulting in a select of an address and a single
load after.

If either of the loads depend on the other, this is not legal as it
could introduce cycles. However, it only checked this if the opcode
was a SELECT, and not for a SELECT_CC.

Unfortunately, the only reproducer I have for this is for our
downstream target. I've tried getting it to trigger on an upstream one
but haven't been successful.

Patch thanks to Bevin Hansson.

llvm-svn: 342980
2018-09-25 14:43:05 +00:00
Sanjay Patel 2c901742ca [DAGCombiner] use UADDO to optimize saturated unsigned add
This is a preliminary step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

If we have an 'add' instruction that sets flags, we can use that to eliminate an
explicit compare instruction or some other instruction (cmn) that sets flags for 
use in the later select.

As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively 
reversing an IR icmp canonicalization that replaces a variable operand with a
constant:
https://rise4fun.com/Alive/V1Q

But we're not using 'uaddo' in those cases via DAG transforms. This happens in 
CGP after D8889 without checking target lowering to see if the op is supported. 
So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with 
"using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned 
saturated add and converts to uaddo without checking target capabilities.

This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only 
see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title 
(unlike x86 which sees improvements for all sizes because all sizes are 'custom'). 
But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. 
So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, 
so this patch will fire on those tests.

Another possibility given the existing behavior: we could remove the legal-or-custom 
check altogether because we're assuming that a UADDO sequence is canonical/optimal 
before we ever reach here. But that seems like a bug to me. If the target doesn't 
have an add-with-flags op, then it's not likely that we'll get optimal DAG combining 
using a UADDO node. This is similar justification for why we don't canonicalize IR to 
the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first 
place.

Differential Revision: https://reviews.llvm.org/D51929

llvm-svn: 342886
2018-09-24 14:47:15 +00:00
Hans Wennborg 83d15dfe2d Remove debug printf leftover from r342397
llvm-svn: 342863
2018-09-24 08:18:47 +00:00
Craig Topper 5bef27e808 [DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTOR
This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015.

llvm-svn: 342856
2018-09-24 02:03:11 +00:00
Craig Topper b3b94a8e8b [DAGCombiner] Clarify a comment. NFC
This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed.

llvm-svn: 342851
2018-09-23 21:17:56 +00:00
Sanjay Patel 0027946915 [DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose 
multiplies generically without a target hook to tell us when it's profitable.

ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.

This extends D52195 and may resolve PR34474: 
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)

llvm-svn: 342844
2018-09-23 18:41:38 +00:00
Craig Topper 81f67f7afb [DAGCombiner] Simplify some code in visitBITCAST. NFCI
llvm-svn: 342826
2018-09-22 23:12:34 +00:00
Craig Topper e79a588cac [DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCI
llvm-svn: 342809
2018-09-22 18:03:14 +00:00
Sanjay Patel 8a1227ccc8 [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. 
No functional change intended.

I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be 
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.

Differential Revision: https://reviews.llvm.org/D52285

llvm-svn: 342669
2018-09-20 17:34:08 +00:00
Sanjay Patel fdc0de19cb [SelectionDAG] allow vector types with isBitwiseNot()
The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits,
and the test diff in add.ll is from a DAGCombiner transform.

llvm-svn: 342594
2018-09-19 21:48:30 +00:00
Sanjay Patel 4fd2e2a498 [DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add
This is an alternative to D37896. I don't see a way to decompose multiplies 
generically without a target hook to tell us when it's profitable. 

ARM and AArch64 may be able to remove some duplicate code that overlaps with 
this transform.

As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474

As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.

Differential Revision: https://reviews.llvm.org/D52195

llvm-svn: 342554
2018-09-19 15:57:40 +00:00
Amara Emerson 91c2913522 Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index.""
Fixed the assertion failure.

llvm-svn: 342397
2018-09-17 14:40:13 +00:00
Sanjay Patel 3eaf500a6d [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.

Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. 
Otherwise, no regressions on x86-64 or A64."

I'm proposing to add only the minimum support for a DAG node here. Since we don't have an 
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I 
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We 
should be able to expand as needed when adding more functionality/transforms. For reference, 
these are transform suggestions currently listed in SimplifyLibCalls.cpp:

//   * cbrt(expN(X))  -> expN(x/3)
//   * cbrt(sqrt(x))  -> pow(x,1/6)
//   * cbrt(cbrt(x))  -> pow(x,1/9)

Also, given that we bail out on long double for now, there should not be any logical 
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).

Differential Revision: https://reviews.llvm.org/D51753

llvm-svn: 342348
2018-09-16 16:50:26 +00:00
Reid Kleckner 4d1b75c6b7 Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."
Causes 'isVector() && "Invalid vector type!"' assertion when building
Skia in Chrome.

llvm-svn: 342265
2018-09-14 19:39:40 +00:00
Amara Emerson ef600cbd86 [DAGCombine] Fix crash when store merging created an extract_subvector with invalid index.
Differential Revision: https://reviews.llvm.org/D51831

llvm-svn: 342183
2018-09-13 21:28:58 +00:00
Sanjay Patel 8a478b79dc [DAGCombiner] improve formatting for select+setcc code; NFC
llvm-svn: 342095
2018-09-12 23:03:50 +00:00
Simon Pilgrim 96d6b9c2e2 [DAGCombiner] foldBitcastedFPLogic - Add basic vector support
Add support for bitcasts from float type to an integer type of the same element bitwidth.

There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet.

llvm-svn: 341652
2018-09-07 12:13:45 +00:00
Sanjay Patel dbf52837fe [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. 
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. Some potential enhancements are in the TODO comments:

1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with 
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty 
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses 
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should 
also be covered by existing tests).

Differential Revision: https://reviews.llvm.org/D51630 

llvm-svn: 341481
2018-09-05 17:01:56 +00:00
Craig Topper 6666861158 [DAGCombiner] Fix bad identation. NFC
llvm-svn: 341103
2018-08-30 19:35:40 +00:00
Simon Pilgrim b49d5f3b53 [DAGCombiner] Add X / X -> 1 & X % X -> 0 folds
Adds more divrem folds to try and get in sync with InstructionSimplify

Differential Revision: https://reviews.llvm.org/D50636

llvm-svn: 340919
2018-08-29 11:30:16 +00:00
Nirav Dave 11e39fb6fb [DAGCombine] Rework MERGE_VALUES to inline in single pass. NFCI.
Avoid hyperlinear cost of inlining MERGE_VALUE node by constructing
temporary vector and doing a single replacement.

llvm-svn: 340853
2018-08-28 18:13:26 +00:00
Craig Topper c7506b28c1 [DAGCombiner][AMDGPU][Mips] Fold bitcast with volatile loads if the resulting load is legal for the target.
Summary:
I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there...

I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))).

Reviewers: efriedma, atanasyan, arsenm

Reviewed By: efriedma

Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50491

llvm-svn: 340797
2018-08-28 03:47:20 +00:00
Matt Arsenault cea7c6969d DAG: Check transformed type for forming fminnum/fmaxnum from vselect
Follow up to r340655 to fix vector types which are split.

llvm-svn: 340766
2018-08-27 18:11:31 +00:00
Sanjay Patel f645927875 [SelectionDAG] add helper query for binops; NFC
We will also use this in a planned enhancement for vector insertelement.

llvm-svn: 340741
2018-08-27 14:20:15 +00:00
Sanjay Patel 113cac3b15 [SelectionDAG][x86] turn insertelement into undef with variable index into splat
I noticed this along with the patterns in D51125, but when the index is variable, 
we don't convert insertelement into a build_vector.

For x86, that means these get expanded at legalization time into the loading/spilling 
code that we see in the tests. I think it's always better to avoid going to memory on 
these, and we get the optimal 'broadcast' if it's available.

I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have 
regression tests that would be affected (although I did not check what would happen in 
those cases). In the most basic cases shown here, AArch64 would probably do much 
better with a splat.

Differential Revision: https://reviews.llvm.org/D51186

llvm-svn: 340705
2018-08-26 18:20:41 +00:00
Matt Arsenault 5b9ef39bdd DAG: Allow matching fminnum/fmaxnum from vselect
llvm-svn: 340655
2018-08-24 21:24:18 +00:00
Craig Topper d8e91c3e8d [DAGCombiner][Mips] Don't combine bitcast+store after LegalOperations when the store is volatile, if the resulting store isn't Legal
Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop.

The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them

This is related to D50491 where I was trying to allow some bitcasting of volatile loads

Differential Revision: https://reviews.llvm.org/D50578

llvm-svn: 340626
2018-08-24 17:48:25 +00:00
Sam Parker 597811e7a7 [DAGCombiner] Reduce load widths of shifted masks
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 340261
2018-08-21 10:26:59 +00:00
Craig Topper cc5dbbf759 [DAGCombiner] Allow divide by constant optimization on opaque constants.
Summary:
I believe this restores the behavior we had before r339147.

Fixes PR38622.

Reviewers: RKSimon, chandlerc, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50936

llvm-svn: 340120
2018-08-18 05:52:42 +00:00
Simon Pilgrim 03e57521c0 [DAGCombiner] extractShiftForRotate - fix out of range shift issue
Don't just check for negative shift amounts.

Fixes OSS Fuzz #9935
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935

llvm-svn: 340015
2018-08-17 12:25:18 +00:00
Simon Pilgrim 5113b48798 [DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding
Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly.

Differential Revision: https://reviews.llvm.org/D35722

llvm-svn: 340010
2018-08-17 10:52:49 +00:00
Craig Topper 883ff69c93 [DAGCombiner] Don't reassociate operations that have the vector reduction flag set.
When nodes are reassociated the vector-reduction flag gets lost.

The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass.

I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set.

Differential Revision: https://reviews.llvm.org/D50827

llvm-svn: 339946
2018-08-16 21:54:05 +00:00
Simon Pilgrim e8a906ba47 [DagCombiner] Don't bother adding to the work list if TLI.BuildSDIVPow2 failed. NFCI.
Matches the code in BuildSDIV/BuildUDIV

llvm-svn: 339757
2018-08-15 10:02:54 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Nirav Dave fbfe2ad9e0 [DAG] Avoid redundant chain transversal in store merge cycle check. NFCI.
Patch by Henric Karlsson.

llvm-svn: 339688
2018-08-14 16:20:43 +00:00
Simon Pilgrim 26e3d3f1c8 [DAGCombiner] simplifyDivRem - add comment describing divide by undef/zero combine. NFC.
llvm-svn: 339561
2018-08-13 13:12:25 +00:00
Matt Arsenault 1201301b94 DAG: Check no-signed-zeros instead of unsafe-fp-math
Addresses fixme, although this should still be checking individual
operand flags.

llvm-svn: 339525
2018-08-12 19:09:12 +00:00
Michael Berg ca38254601 extend folding fsub/fadd to fneg for FMF
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Reviewers: spatel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D50195

llvm-svn: 339357
2018-08-09 17:00:03 +00:00
Sanjay Patel e47dc1a405 [DAGCombiner] loosen constraints for fsub+fadd fold
isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.

llvm-svn: 339299
2018-08-08 23:04:43 +00:00
Sanjay Patel e327266d45 [DAGCombiner] move fadd simplification ahead of other folds
I don't know if it's possible to expose this diff in a test,
but we should always try simplifications (no new nodes created)
before more complicated transforms for efficiency (similar to
what we do in IR).

llvm-svn: 339298
2018-08-08 22:46:30 +00:00
Simon Pilgrim 4d4220fa2a [DAG] DAGCombiner::visitSDIVLike - remove unnecessary isConstOrConstSplat call. NFCI.
The isConstOrConstSplat result is only used in a ISD::matchUnaryPredicate call which can perform the equivalent iteration just as quickly.

llvm-svn: 339262
2018-08-08 15:37:52 +00:00
Craig Topper 49ed49fcb1 [SelectionDAG] When splitting scatter nodes during DAGCombine, create a serial chain dependency.
Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code.

Differential Revision: https://reviews.llvm.org/D50374

llvm-svn: 339157
2018-08-07 17:35:02 +00:00
Simon Pilgrim 1bfadb0499 [DAG] Allow non-uniform constant vectors to call BuildSDIV
This was missed in D50185.

NFC until we add actual non-uniform support to BuildSDIV (similar BuildUDIV support in D49248) - for now it just early outs.

llvm-svn: 339147
2018-08-07 14:50:39 +00:00
Simon Pilgrim 7e18938793 [TargetLowering] Add support for non-uniform vectors to BuildUDIV
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.

It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.

Differential Revision: https://reviews.llvm.org/D49248

llvm-svn: 339121
2018-08-07 09:51:34 +00:00
Craig Topper 9de1797c50 [SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru.
Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from.

llvm-svn: 339101
2018-08-07 06:52:49 +00:00
Craig Topper 17989208a9 [SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes.
getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather.

llvm-svn: 339096
2018-08-07 06:13:40 +00:00
Simon Pilgrim 94112ebc75 [TargetLowering] Generalise BuildSDIV function
First step towards a BuildSDIV equivalent to D49248 for non-uniform vector support - this just pushes the splat detection down into TargetLowering::BuildSDIV where its still used.

Differential Revision: https://reviews.llvm.org/D50185

llvm-svn: 338838
2018-08-03 10:00:54 +00:00
Craig Topper 2f60ef2c78 [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.

llvm-svn: 338329
2018-07-30 23:22:00 +00:00
Sanjay Patel 9f807f44b1 [DAGCombiner] transform sub-of-shifted-signbit to add
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH

This is another step towards improving select-of-constants codegen (see D48970).

x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.

Differential Revision: https://reviews.llvm.org/D49924

llvm-svn: 338317
2018-07-30 22:21:37 +00:00
Craig Topper a568a27dfa [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2.
llvm-svn: 338303
2018-07-30 21:04:34 +00:00
Craig Topper b94d5f853b Revert r338222 "[DAGCombiner] Remove unnecessary calls to AddToWorklist."
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.

So its probably safer to leave this alone.

llvm-svn: 338298
2018-07-30 20:27:10 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
David Bolvansky 2fa7fb14ea [DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed.  This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.

Patch by: sameconrad (Sam Conrad)

Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar

Reviewed By: lebedev.ri

Subscribers: efriedma, kparzysz, llvm-commits

Differential Revision: https://reviews.llvm.org/D47681

llvm-svn: 338270
2018-07-30 16:50:00 +00:00
Craig Topper e978d2ee4a [DAGCombiner] Remove unnecessary calls to AddToWorklist.
The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes.

I've removed the most obvious cases here. There are probably more than can be removed.

llvm-svn: 338222
2018-07-29 18:39:26 +00:00
Craig Topper 9db3573d3a [SelectionDAG] Pass std::vector by reference instead of by pointer to BuildSDIV/BuildUDIV.
This removes the need for an assert to ensure the pointer isn't null.

Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here.

llvm-svn: 338205
2018-07-28 19:44:20 +00:00
Craig Topper 50b1d4303d [DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B)
This can be useful since addition is commutable, and subtraction is not.

This matches a transform that is also done by InstCombine.

llvm-svn: 338181
2018-07-28 00:27:25 +00:00
Sanjay Patel c7abb416dc [DAGCombiner] fold 'not' with signbit math
This is a follow-up suggested in D48970. 

Alive proofs:
https://rise4fun.com/Alive/sII

We can eliminate an instruction in the usual select-of-constants 
to bit hack transform by adjusting the add/sub with constant.
This is always a win. 

There are more transforms that are likely wins, but they may need 
target hooks in case some targets do not benefit. 

This is another step towards making up for canonicalizing to 
select-of-constants in rL331486.

llvm-svn: 338132
2018-07-27 16:42:55 +00:00
Craig Topper 8b5a2f7aac [DAGCombiner] Remove some calls to AddToWorklist that should be unnecessary.
The DAGCombiner has a system for ensuring all nodes are visited. It doesn't require an AddToWorkList for every node that is created by a combine.

llvm-svn: 338079
2018-07-26 22:40:22 +00:00
Nirav Dave 25802ac9fd [DAG] Avoid Node Update assertion due to AND simplification
Check for construction-time folding for incomplete AND nodes in
BackwardsPropagateMask.

Fixes PR38185.

Reviewers: RKSimon, samparker

Reviewed By: samparker

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D49444

llvm-svn: 337563
2018-07-20 15:27:24 +00:00
Nirav Dave 5a4e11ad9c [DAG] Fix Memory ordering check in ReduceLoadOpStore.
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.

This fixes PR37826.

Reviewers: spatel, davide, efriedma, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49388

llvm-svn: 337560
2018-07-20 15:20:50 +00:00
Craig Topper d8734450a2 [DAGCombiner] Fold X - (-Y *Z) -> X + (Y * Z)
llvm-svn: 337518
2018-07-20 01:40:03 +00:00
Craig Topper c12c5d421f [DAGCombiner] Teach DAGCombiner that A-(-B) is A+B.
We already knew A+(-B) is A-B in visitAdd. This does the opposite for visitSub.

llvm-svn: 337502
2018-07-19 22:24:43 +00:00
Simon Pilgrim e4d12bb2d6 [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Differential Revision: https://reviews.llvm.org/D49262

llvm-svn: 337258
2018-07-17 09:45:35 +00:00
Fangrui Song cb0bab86b3 [CodeGen] Fix inconsistent declaration parameter name
llvm-svn: 337200
2018-07-16 18:51:40 +00:00
Sanjay Patel 79a423cfa2 [DAGCombiner] fix typo in comment; NFC
llvm-svn: 337132
2018-07-15 17:09:35 +00:00
Sanjay Patel a41c886c55 [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970

llvm-svn: 337130
2018-07-15 16:27:07 +00:00
Diogo N. Sampaio b0d85ef975 [NFC][InstCombine] Converts isLegalNarrowLoad into isLegalNarrowLdSt
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.

Reviewsers: samparker

Differential Revision: https://reviews.llvm.org/D48624

llvm-svn: 336800
2018-07-11 12:59:42 +00:00
Simon Pilgrim 075b04a55f [SelectionDAG] Add constant buildvector support to isKnownNeverZero
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).

llvm-svn: 336779
2018-07-11 09:56:41 +00:00
Simon Pilgrim df9d59771b [DAGCombiner] Support non-uniform X%C -> X-(X/C)*C folds
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.

We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.

Differential Revision: https://reviews.llvm.org/D48975

llvm-svn: 336774
2018-07-11 09:22:42 +00:00
Simon Pilgrim 97cf111689 [DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) fold
llvm-svn: 336773
2018-07-11 09:14:37 +00:00
Simon Pilgrim 4cb4609392 [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.

llvm-svn: 336701
2018-07-10 16:33:07 +00:00
Simon Pilgrim 641097d561 [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid recursive combining.
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.

llvm-svn: 336664
2018-07-10 13:18:16 +00:00
Simon Pilgrim ce5c19b623 [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI.
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.

llvm-svn: 336656
2018-07-10 11:38:00 +00:00
Roman Lebedev 5ccae1750b [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585
2018-07-09 19:06:42 +00:00
Simon Pilgrim c1d1944053 [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D48825

llvm-svn: 336490
2018-07-07 17:30:06 +00:00
Nico Weber 038dbf3c24 Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.
llvm-svn: 336453
2018-07-06 17:37:24 +00:00
Diogo N. Sampaio 17be994942 Added missing semicolon
llvm-svn: 336428
2018-07-06 10:09:04 +00:00
Diogo N. Sampaio 742bf1a255 [SelectionDAG] https://reviews.llvm.org/D48278
D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

llvm-svn: 336426
2018-07-06 09:42:25 +00:00
Diogo N. Sampaio 734cfd11fe Testing commit permision
llvm-svn: 336384
2018-07-05 18:49:32 +00:00
Simon Pilgrim 74cc4cfa94 [DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code

llvm-svn: 336198
2018-07-03 14:11:32 +00:00
Simon Pilgrim fae337704e [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.

Thanks to @zvi for the original patch.

Differential Revision: https://reviews.llvm.org/D45806

llvm-svn: 336048
2018-06-30 12:22:55 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Simon Pilgrim 49cb65bb7b [DAGCombiner] Remove unused variable. NFCI.
Noticed in D45806 review.

llvm-svn: 335817
2018-06-28 09:29:08 +00:00
Nirav Dave 7c57ae57a8 [DAGCombine] Disable TokenFactor simplifications when optnone.
llvm-svn: 335773
2018-06-27 19:41:25 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim e835f662fa [DAGCombiner] visitSDIV - simplify pow2 handling. NFCI.
Use the builtin constant folding of getNode() etc. instead of doing it manually.

llvm-svn: 335720
2018-06-27 10:51:55 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Sanjay Patel fb9c440ba5 [DAGCombiner] use isBitwiseNot to simplify code; NFC
llvm-svn: 335652
2018-06-26 19:46:56 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Simon Pilgrim 133b1cdf08 [DAGCombiner] Pull out VT bitwidth in visitSDIV. NFCI.
llvm-svn: 335617
2018-06-26 15:39:16 +00:00
Simon Pilgrim 5b6b500687 Fix -Wparentheses gcc warning. NFCI.
llvm-svn: 335451
2018-06-25 11:19:05 +00:00
Sanjay Patel 962ee178fa [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466

llvm-svn: 335433
2018-06-24 14:37:30 +00:00
Stanislav Mekhanoshin 22ee191c3e DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1"
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.

Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.

In turn the same pattern with "or" opcode can reach DAG.

Differential Revision: https://reviews.llvm.org/D48301

llvm-svn: 335250
2018-06-21 16:02:05 +00:00
David Green a465188500 [DAGCombine] Fix alignment for offset loads/stores
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.

This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.

Differential Revision: https://reviews.llvm.org/D48029

llvm-svn: 335210
2018-06-21 08:30:07 +00:00
Stanislav Mekhanoshin 20279dc025 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

llvm-svn: 335167
2018-06-20 20:24:20 +00:00
Nirav Dave cd558887d3 [DAG] Fix and-mask folding when narrowing loads.
Summary:
Check that and masks are strictly smaller than implicit mask from
narrowed load.

Fixes PR37820.

Reviewers: samparker, RKSimon, nemanjai

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48335

llvm-svn: 335137
2018-06-20 15:36:29 +00:00
Craig Topper ddd88a559f [DAGCombiner] Add some comments to some true/false arguments to make it obvious what they are. NFC
llvm-svn: 335095
2018-06-20 04:32:07 +00:00
Michael Berg 7b993d762f Utilize new SDNode flag functionality to expand current support for fadd
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47909

llvm-svn: 334996
2018-06-18 23:44:59 +00:00
Michael Berg 932ba20af8 refactor of visitFADD for AllowNewConst cases
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.

Reviewers: spatel, hfinkel, wristow

Reviewed By: spatel

Subscribers: nhaehnle

Differential Revision: https://reviews.llvm.org/D48289

llvm-svn: 334984
2018-06-18 21:12:21 +00:00
Michael Berg 8e570c3390 Utilize new SDNode flag functionality to expand current support for fma
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai

Reviewed By: rampitec, nhaehnle

Subscribers: tpr, nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47918

llvm-svn: 334876
2018-06-16 00:03:06 +00:00
Michael Berg 02d1c6c0cf Utilize new SDNode flag functionality to expand current support for fdiv
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47954

llvm-svn: 334862
2018-06-15 20:44:55 +00:00
Matt Arsenault df2f4ef29d DAG: Fix creating concat_vectors with illegal type
Test passes as is, but fails with future patch to make v4i16/v4f16
legal.

llvm-svn: 334823
2018-06-15 12:09:15 +00:00
Michael Berg 0c20447a02 easing the constraint for isNegatibleForFree and GetNegatedExpression
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.  



Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: efriedma, wdng, tpr

Differential Revision: https://reviews.llvm.org/D48057

llvm-svn: 334769
2018-06-14 20:54:13 +00:00
Michael Berg 4663ceb63f updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
Summary:  A FMF constraint is added to FADD with unsafe still available as the fallback

Reviewers: spatel, wristow, arsenm, hfinkel

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D48180

llvm-svn: 334753
2018-06-14 18:48:31 +00:00
Sanjay Patel 7d4929611c [DAGCombiner] remove hasOneUse() check from fadd constants transform
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.

The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is 
used by MachineCombiner, so I'll file a bug report to follow-up.

llvm-svn: 334608
2018-06-13 15:22:48 +00:00
Krzysztof Parzyszek 82d284c1d2 [DAGCombiner] Recognize more patterns for ABS
Differential Revision: https://reviews.llvm.org/D47831

llvm-svn: 334553
2018-06-12 21:51:49 +00:00
Michael Berg 5d49f66570 Utilize new SDNode flag functionality to expand current support for fmul
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: nhaehnle, wdng

Differential Revision: https://reviews.llvm.org/D47911

llvm-svn: 334514
2018-06-12 16:13:11 +00:00
Krzysztof Parzyszek 3d671248ab [SelectionDAG] Provide default expansion for rotates
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.

Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.

Differential Revision: https://reviews.llvm.org/D47725

llvm-svn: 334497
2018-06-12 12:49:36 +00:00
Matt Arsenault 5615fa0a87 DAG: Fix extract_subvector combine for a single element
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.

Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.

Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely  changed since the selects
are eliminated. Not sure if it's considered better
or not.

llvm-svn: 334440
2018-06-11 21:27:41 +00:00
Sanjay Patel 3e5c70cc1d [DAGCombiner] match vector compare and select sizes with extload operand (PR37427)
This patch started off much more general and ambitious, but it's been a nightmare 
seeing all the ways x86 vector codegen can go wrong.

So the code is still structured to allow extending easily, but it's currently 
limited in several ways:

1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.

The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary 
intermediate cast.

There's a clear regression in the last test (sgt_zero_fp_select) because we longer 
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present 
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit 
setcc from a sign-extended operand and remove it.

Differential Revision: https://reviews.llvm.org/D47330

llvm-svn: 334378
2018-06-10 23:09:50 +00:00
Craig Topper 61998289f9 Use SmallPtrSet instead of SmallSet in places where we iterate over the set.
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.

These places were found by hiding the begin/end methods in the SmallSet forwarding

llvm-svn: 334343
2018-06-09 05:04:20 +00:00
Sanjay Patel 498564e6fb [DAGCombiner] clean up comments; NFC
llvm-svn: 334312
2018-06-08 18:00:46 +00:00
Michael Berg bf90d1f263 Utilize new SDNode flag functionality to expand current support for fsub
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D47910

llvm-svn: 334306
2018-06-08 17:39:50 +00:00
Sam Parker 16f963ba0d [DAGCombine] Fix for PR37667
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.

Differential Revision: https://reviews.llvm.org/D47878

llvm-svn: 334268
2018-06-08 07:49:04 +00:00
Michael Berg 77b5be7ec6 propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.

Reviewers: spatel, arsenm, hfinkel, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47388

llvm-svn: 334242
2018-06-07 22:49:09 +00:00
Matt Arsenault e8eb567e17 DAG: Avoid bitcast/ext/build_vector combine
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.

I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.

Test included in future commit.

llvm-svn: 334218
2018-06-07 19:42:27 +00:00
Michael Berg cc1c4b6912 guard fsqrt with fmf sub flags
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.


Reviewers: spatel, hfinkel, arsenm

Reviewed By: spatel

Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai

Differential Revision: https://reviews.llvm.org/D47749

llvm-svn: 334113
2018-06-06 18:47:55 +00:00
Reid Kleckner adcaddb6da Fix -Wcovered-switch-default warning and clang-format it
llvm-svn: 333967
2018-06-04 23:47:29 +00:00
Amaury Sechet da661e9236 [DAGcombine] Teach the combiner about -a = ~a + 1
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46505

llvm-svn: 333943
2018-06-04 19:23:22 +00:00
Amaury Sechet 93a7d2aa3c Get rid of SETCCE
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47685

llvm-svn: 333939
2018-06-04 18:36:22 +00:00
Krzysztof Parzyszek 623eb54361 [SelectionDAG] Add missing closing parentheses in comments, NFC
llvm-svn: 333907
2018-06-04 14:54:53 +00:00
Nirav Dave fc9a700f94 [DAG] Avoid checking for consecutive stores in store merge. NFCI.
llvm-svn: 333766
2018-06-01 15:05:55 +00:00
Nirav Dave 39ece11ae5 [DAG] Simplify Expression. NFC.
llvm-svn: 333765
2018-06-01 15:05:30 +00:00
Nirav Dave 0fc27acaa2 [DAG] Remove untriggerable check. NFCI.
Candidate check precludes this check.

llvm-svn: 333764
2018-06-01 15:05:05 +00:00
Nirav Dave a74921a696 [DAG] Prune store merge legal store check to stop invalid size. NFCI.
Do not consider store sizes large than the maximum legal store size.

llvm-svn: 333763
2018-06-01 15:04:40 +00:00
Krzysztof Parzyszek 0b6187c1a9 [SelectionDAG] Expand UADDO/USUBO into ADD/SUBCARRY if legal for target
Additionally, implement handling of ADD/SUBCARRY on Hexagon, utilizing
the UADDO/USUBO expansion.

Differential Revision: https://reviews.llvm.org/D47559

llvm-svn: 333751
2018-06-01 14:00:32 +00:00
Roman Lebedev 9f65d16d5d [DAGCombiner] isAllOnesConstantOrAllOnesSplatConstant(): look through bitcasts
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.

Differential Revision: https://reviews.llvm.org/D47156

llvm-svn: 332905
2018-05-21 21:41:10 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Craig Topper 25444c852a [DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.

This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.

As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.

I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.

Differential Revision: https://reviews.llvm.org/D47116

llvm-svn: 332895
2018-05-21 21:09:18 +00:00
Nirav Dave 11fd14c1ac [DAG] Prune cycle check in store merge.
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.

Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.

Reviewers: jyknight

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D46955

llvm-svn: 332490
2018-05-16 16:48:20 +00:00
Nirav Dave d9d86cb738 [DAG] Defer merge store cycle checking to just before merge. NFCI.
llvm-svn: 332489
2018-05-16 16:47:54 +00:00
Nirav Dave a87de7d846 [DAGCombine] Move load checks on store of loads into candidate
search. NFCI.

Migrate single-use and non-volatility, non-indexed requirements on
stores of immediate store values to candidate collection pass from
later stage.

llvm-svn: 332392
2018-05-15 20:31:53 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Vedant Kumar 99d5c072f0 [DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).

Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.

ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.

This was suggested by Adrian in https://reviews.llvm.org/D45995.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46216

llvm-svn: 332119
2018-05-11 18:40:10 +00:00
Vedant Kumar fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Vedant Kumar f0e5f7c45e [DAGCombiner] Factor out duplicated logic for an extload combine, NFC (5/N)
Part of the logic for combining (zext (load ...)) and (sext (load ...))
is duplicated. This creates problems because bugs in one version have to
be fixed again in the other version.

To address this, as a first step, I've extracted the duplicate logic
into a helper. I'll fix the debug location bug in the helper and
eliminate the copy of its logic in a followup.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46157

llvm-svn: 332117
2018-05-11 18:40:02 +00:00
Nirav Dave a5ad417589 [DAG] Avoid using deleted node in rebuildSetCC
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.

Fixes PR36602.

Reviewers: dbabokin, RKSimon, eli.friedman, davide

Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D46404

llvm-svn: 331985
2018-05-10 14:28:54 +00:00
Craig Topper 176ec8506f [DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops.
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.

This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.

llvm-svn: 331896
2018-05-09 17:14:27 +00:00
Amara Emerson 4e66142f14 [DAGCombine] Change store merge candidates check cut off to 1024.
The previous value of 8192 resulted in severe compile time hits in
some pathological cases.

rdar://39781410

Differential Revision: https://reviews.llvm.org/D46581

llvm-svn: 331888
2018-05-09 15:53:06 +00:00
Roman Lebedev 9bd6067db6 [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates
Summary:
Split off from D46031.

The previous patch, D46493, completely disabled unfolding in case of immediates.
But we can do better:
{F6120274} {F6120277}

https://rise4fun.com/Alive/xJS

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D46494

llvm-svn: 331685
2018-05-07 21:52:22 +00:00
Roman Lebedev cc42d08b1d [DagCombiner] Not all 'andn''s work with immediates.
Summary:
Split off from D46031.

In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.

This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D46493

llvm-svn: 331684
2018-05-07 21:52:11 +00:00
Roman Lebedev cb1af9134a [NFC][DAGCombine] unfoldMaskedMerge(): rename two variables
The current names can be confused with the A and B sides
of the canonical masked merge pattern.

llvm-svn: 331609
2018-05-06 20:02:22 +00:00
Roman Lebedev a3b0b59f54 [DAGCombiner] Masked merge: don't touch "not" xor's.
Summary:
Split off form D46031.

It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.

That being said, we may want to re-visit this `not` handling, maybe in D46073.

Reviewers: spatel, craig.topper, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46492

llvm-svn: 331595
2018-05-05 15:45:40 +00:00
Roman Lebedev 49ada82fa7 [NFC][DagCombiner] unfoldMaskedMerge(): improve readability.
llvm-svn: 331588
2018-05-05 10:39:54 +00:00
Craig Topper 781aa181ab Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Vedant Kumar e23173b677 [DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N)
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.

This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.

Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262

llvm-svn: 331303
2018-05-01 19:51:15 +00:00
Vedant Kumar d7117ed0f9 [DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N)
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262.

Differential Revision: https://reviews.llvm.org/D46222

llvm-svn: 331302
2018-05-01 19:51:15 +00:00
Vedant Kumar cc7b2a55c2 [DAGCombiner] Change the SDLoc on split extloads (2/N)
In DAGCombiner, we try to simplify this pattern:

  ([s|z]ext (load ...))

Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.

Making this change affects the IROrder of the new load, causing some
test case churn.

In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46156

llvm-svn: 331301
2018-05-01 19:29:15 +00:00
Vedant Kumar ee4bfcaa5a [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

llvm-svn: 331300
2018-05-01 19:26:15 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sanjay Patel 1babf5ff32 [DAGCombiner] rename function attribute for disabling ftrunc transform
This is the matching name change for the Clang patch at:
D46236
rL331209

Differential Revision: https://reviews.llvm.org/D46237 

llvm-svn: 331210
2018-04-30 18:20:33 +00:00
Heejin Ahn d20d0648ed [DAGCombiner] Fix a case of 1 in non-splat vector pow2 divisor
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)).

Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.

(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)

Reviewers: RKSimon, jgravelle-google

Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46161

llvm-svn: 331092
2018-04-27 22:23:11 +00:00
Sanjay Patel 5a90285bd9 [DAGCombiner] limit ftrunc optimizations with function attribute
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437

llvm-svn: 330951
2018-04-26 16:04:44 +00:00
Sanjay Patel a5da086386 [DAGCombiner] refactor FP->int->FP folds; NFC
As discussed in the post-review comments for rL330437,
we need to guard this fold to allow existing code to
keep working with the undefined behavior that they've
come to rely on.

That would mean duplicating more code than we already 
have, so let's fix that first. 

llvm-svn: 330947
2018-04-26 15:20:18 +00:00
Craig Topper f3cefad255 [DAGCombiner][X86] When promoting loads don't use ZEXTLOAD even its legal
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.

Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.

This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.

Differential Revision: https://reviews.llvm.org/D45585#inline-402825

llvm-svn: 330781
2018-04-24 22:35:27 +00:00
Roman Lebedev 95c6eaf530 [DAGCombiner] Unfold scalar masked merge if profitable
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
So we need to make sure that they are still generated.

If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

----

I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D45733

llvm-svn: 330646
2018-04-23 20:38:49 +00:00
Sanjay Patel 3d453ad711 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Gerolf Hoflehner 5b4a67af1b [DAGCombiner] Fix for oss-fuzz bug
llvm-svn: 330178
2018-04-17 07:22:34 +00:00
Sanjay Patel 6c3af659a2 [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses
This is a transform that I limited in instcombine in rL329821 because it was 
creating more instructions in IR when the cast has multiple uses.

But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.

Differential Revision: https://reviews.llvm.org/D45598

llvm-svn: 330098
2018-04-15 16:43:48 +00:00
Sanjay Patel a54e7d1a6d [DAGCombiner] simplify code; NFC
llvm-svn: 329964
2018-04-12 22:14:58 +00:00
Sanjay Patel 5ace2b765a revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.

llvm-svn: 329920
2018-04-12 15:27:01 +00:00
Sam Parker 1f4f4d9a08 [DAGCombine] Improve ReduceLoad for SRL
Recommitting r329283, third time lucky...

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329551
2018-04-09 08:16:11 +00:00
Zvi Rackover 7a53f169f1 DAGCombiner: Combine SDIV with non-splat vector pow2 divisor
Summary:
Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.

Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma

Reviewed By: RKSimon

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42479

llvm-svn: 329525
2018-04-08 11:35:20 +00:00
Guozhi Wei 0eb86c8efc [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 329516
2018-04-07 23:36:10 +00:00
Craig Topper 5b95eae1c3 [DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector.
llvm-svn: 329509
2018-04-07 19:09:50 +00:00
Mandeep Singh Grang e92f0cfe34 [CodeGen] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: bogner, rnk, MatzeB, RKSimon

Reviewed By: rnk

Subscribers: JDevlieghere, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45133

llvm-svn: 329435
2018-04-06 18:08:42 +00:00
Sam Parker 0e7deb8104 [DAGCombine] Revert r329160
Again, broke the big endian stage 2 builders.

llvm-svn: 329283
2018-04-05 13:46:17 +00:00
Sam Parker 7ec722d603 [DAGCombine] Improve ReduceLoadWidth for SRL
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.

Original commit message:

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329160
2018-04-04 09:26:56 +00:00
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Sanjay Patel e09b7dcf3d [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Martin Storsjo db75aa96d3 Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))"
This reverts commit r328252. This change broke building a number
of projects when targeting ARM and AArch64, see PR36873.

llvm-svn: 328297
2018-03-23 08:36:47 +00:00
Guozhi Wei 17ff975eb1 [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 328252
2018-03-22 21:47:25 +00:00
Craig Topper 956fec2a4a [DAGCombiner] Fix type in comment. NFC
llvm-svn: 327916
2018-03-19 22:25:26 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Craig Topper 4aeec51986 [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG.
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.

llvm-svn: 327446
2018-03-13 20:36:28 +00:00
Simon Pilgrim 9855b39380 [DAGCombine] visitREM - Don't assume that one divrem isn't driving another
Under some circumstances the divrems won't have been combined together before getting to this code.

So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.

Reduced from OSS-Fuzz #6883
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883

llvm-svn: 327424
2018-03-13 17:17:15 +00:00
Nirav Dave 775f07d121 Make early exit hasPredecessorHelper return true. NFCI.
All uses conservatively assume in early exit case that it will be a
predecessor. Changing default removes checking code in all uses.

llvm-svn: 327169
2018-03-09 20:56:51 +00:00
Craig Topper 4196dd12a2 [DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant.
Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast.

llvm-svn: 326677
2018-03-04 18:51:46 +00:00
Craig Topper e7ca6f5456 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

llvm-svn: 326500
2018-03-01 22:32:25 +00:00
Amaury Sechet 893a6b89ff [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner.
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.

Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.

Reviewers: spatel, hfinkel, niravd, craig.topper

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D41235

llvm-svn: 325892
2018-02-23 11:50:42 +00:00
Simon Pilgrim be72fe1fda [SelectionDAG] Move matchUnaryPredicate/matchBinaryPredicate into SelectionDAGNodes.h
This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.).

Differential Revision: https://reviews.llvm.org/D43466

llvm-svn: 325815
2018-02-22 18:45:13 +00:00
Craig Topper 1d104b996a [DAGCombiner] Add two calls to isVector before making calls to getVectorElementType/getVectorNumElements to avoid an assert.
We looked through a BITCAST, but the bitcast might be a from a scalar type rather than a vector.

I don't have a test case. I stumbled onto it while prototyping another change that isn't ready yet.

llvm-svn: 325750
2018-02-22 07:05:27 +00:00
Craig Topper 35801fa5ce [SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and simplify DAGCombiner and simplifySetCC code and fix a bug.
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.

This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.

Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.

Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.

Fixes PR36250.

Differential Revision: https://reviews.llvm.org/D43449

llvm-svn: 325602
2018-02-20 17:41:05 +00:00
Simon Pilgrim d6beac3b76 [DAGCombiner] Remove simplifyShuffleMask - now handled more generally by SimplifyDemandedVectorElts.
llvm-svn: 325429
2018-02-17 12:36:56 +00:00
Craig Topper dac3c1f5c8 [DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses
Same for the sign extend case.

Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.

This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?

Differential Revision: https://reviews.llvm.org/D43063

llvm-svn: 325289
2018-02-15 20:20:32 +00:00
Simon Pilgrim 80663ee986 [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.

Further features can be moved/added in future patches.

Differential Revision: https://reviews.llvm.org/D42896

llvm-svn: 325232
2018-02-15 12:14:15 +00:00
Alexander Ivchenko 7e5d525bd5 [SelectionDAG][X86] Fix incorrect offset generated for VMASKMOV
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.

llvm-svn: 325135
2018-02-14 15:55:24 +00:00
Craig Topper f73ff612ca [DAGCombiner] Add one use check to fold (not (and x, y)) -> (or (not x), (not y))
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.

While there add a one use check to the transform above that looked similar.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43225

llvm-svn: 325019
2018-02-13 16:25:27 +00:00
Simon Pilgrim 0be5567a89 [X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.

Differential Revision: https://reviews.llvm.org/D43014

llvm-svn: 324837
2018-02-11 10:52:37 +00:00
Craig Topper 36f913ee80 [SelectionDAG] Remove TargetLowering::getConstTrueVal. Use SelectionDAG::getBoolConstant in the one place it was used.
SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently.

llvm-svn: 324832
2018-02-11 04:58:58 +00:00
Vedant Kumar 7fd9a58d8c Revert "WIP: [DAGCombiner] Assert that debug info is preserved"
This reverts commit r324648. It was committed accidentally.

llvm-svn: 324650
2018-02-08 20:27:35 +00:00
Vedant Kumar 28323ff5a3 WIP: [DAGCombiner] Assert that debug info is preserved
llvm-svn: 324648
2018-02-08 20:27:09 +00:00
Craig Topper c19aed963e [DAGCombiner] Fix a couple mistakes from r324311 by really passing the original load to ExtendSetCCUses.
We're passing the binary op that uses the load instead of the load.

Noticed by inspection. Not sure how to test this because this just prevents the introduction of an extend that will later be truncated and will probably be combined out.

llvm-svn: 324568
2018-02-08 06:27:18 +00:00