Commit Graph

3351 Commits

Author SHA1 Message Date
Shivam Gupta 0489e89119 [DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience
When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the combine
for adjacent stores so we get this behavior at -O0.

Similar to D7181.

Reviewed By: spatel, xgupta

Differential Revision: https://reviews.llvm.org/D115808
2021-12-23 10:48:28 +05:30
Shivam Gupta eb66f0662a Revert "[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience"
This reverts commit 731bde1ed3.
2021-12-20 21:43:40 +05:30
Shivam Gupta 731bde1ed3 [DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience
When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the combine
for adjacent stores so we get this behavior at -O0.

Similar to D7181.

Differential Revision: https://reviews.llvm.org/D115808
2021-12-19 20:58:49 +05:30
Simon Pilgrim efec3a26b4 [DAG] visitADDSAT/visitSUBSAT - merge scalar/vector canonicalization and constant folding.
Match order of most of the other integer opcode combines
2021-12-19 13:19:40 +00:00
Simon Pilgrim c1340b9e78 [DAG] Improve FMINNUM/FMAXNUM/FMINIMUM/FMAXIMUM constant folding
Merge the node combines into a common DAGCombiner::visitFMinMax (like we do for IMINMAX).

Move the constant folding into SelectionDAG::foldConstantFPMath.

This allows us to fold the vecreduce-propagate-sd-flags.ll test as it reduces constants - so I've refactored it to take variables instead.

Differential Revision: https://reviews.llvm.org/D115952
2021-12-19 11:45:51 +00:00
Sanjay Patel 79932211f9 [SDAG] remove FP-to-int cast attribute check in fold to FTRUNC
We were using a function attribute to indicate a non-standard FP mode,
but now we can use intrinsics for that job as shown in the new tests.
Presumably the x86 asm could be improved for that IR with intrinsics,
but I have not worked out exactly how to do that. Note that the
transform to FTRUNC still requires a hacky check for "nsz" (because
FMF are not applied to FP casts).

This is a cleanup based on the clang change in D115804 / 8c7f2a4f87 .
This is effectively a revert of 5a90285bd9 + D46237 .

Differential Revision: https://reviews.llvm.org/D115885
2021-12-17 16:01:37 -05:00
Kazu Hirata 90bd4873d6 [CodeGen] Fix an unused variable warning
This patch fixes:

  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:22617:11: error:
  unused variable 'Ops' [-Werror,-Wunused-variable]
2021-12-17 09:43:42 -08:00
Simon Pilgrim 35c7b1aeae [DAG] SimplifyVBinOp - remove FoldConstantArithmetic call.
Constant folding (scalar/vector) is now consistently handled before the SimplifyVBinOp calls.
2021-12-17 17:22:23 +00:00
Simon Pilgrim f602723bfa [DAG] Constant fold + canonicalize fp binops before SimplifyVBinOp call
Replace custom constant scalar/splat folding with FoldConstantArithmetic call and canonicalize commutative constant ops to the RHS before the SimplifyVBinOp call
2021-12-17 17:02:54 +00:00
David Truby 5c9684704d [DAG][sve] Lowering for VLS masked truncating stores
This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Reviewed By: peterwaller-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108115
2021-12-17 15:04:45 +00:00
Simon Pilgrim 42f00106b7 [DAG] Constant fold + canonicalize integer binops before SimplifyVBinOp call
SimplifyVBinOp still has a FoldConstantArithmetic call, which now it isn't vector specific we should be able to remove (once fp binops are tidied up); but we can at least clean up the integer opcodes to perform the basic constant/undef handling in common code first.
2021-12-17 12:02:27 +00:00
Roman Lebedev c1a36ba002
[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
In most test changes this allows us to drop some broadcasts/shuffles.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104156
2021-12-13 20:03:44 +03:00
David Green 5d7efd4758 [SDAG] Refine MMO size when converting masked load/store to normal load/store
After D113888 / 32b6c17b29 the MMO size of a masked loads/store is
unknown. When we are converting back to a standard load/store because
the mask is known all ones, we can refine that to the correct size from
the size of the vector being loaded/stored.

Differential Revision: https://reviews.llvm.org/D114582
2021-12-08 10:13:25 +00:00
David Green 57ff805a6d [DAG] Create fptoui.sat from clamped fptosi
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.

X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.

Differential Revision: https://reviews.llvm.org/D112428
2021-12-05 09:25:52 +00:00
Simon Pilgrim 19d34f6e95 [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Differential Revision: https://reviews.llvm.org/D113371
2021-12-01 16:37:49 +00:00
Bradley Smith 0eb1efb92c [DAGCombiner] When combining REM ensure optimized div nodes are unique
The REM DAG combine uses the visitDivLike functions to try and get an
optimized DIV node to provide better codegen, however in some cases this
visitDivLike call ends up in the BuildSDIVPow2 target hook, which in
turn sometimes will return the same node passed in to indicate not to
change it. The REM DAG combine does not anticipate this and creates a
cycle in the DAG because of it.

Fix this by ensuring any such optimized div node returned is distinct
from the node being combined.

Differential Revision: https://reviews.llvm.org/D114716
2021-12-01 11:24:26 +00:00
Simon Pilgrim 9981dd142f [DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC.
Reduce diff in D114582
2021-12-01 11:23:47 +00:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Bradley Smith 6180806632 [AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom
This allows the generic DAG combine to fold fp_extend/fp_trunc into
loads/stores which we can then lower into a integer extending
load/truncating store plus an FP_EXTEND/FP_ROUND.

The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked
types hence lowering them introduces an unpack/zip. By allowing these
nodes to be combined with loads/store we make it much easier to have
this unpack/zip combined into the load/store by our custom lowering.

Differential Revision: https://reviews.llvm.org/D114580
2021-11-29 11:56:07 +00:00
Simon Pilgrim 812e64ef0c [DAG] MatchRotate - support rotate-by-constant of illegal types
Patch to fix some of the regressions in D77804.

By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.

Differential Revision: https://reviews.llvm.org/D113192
2021-11-19 11:12:04 +00:00
Craig Topper 233def40f7 [DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948
2021-11-15 17:15:51 -08:00
Simon Pilgrim 7bac1985f4 [DAG] SimplifyVBinOp - add SDLoc() argument
Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats
2021-11-15 10:43:56 +00:00
Simon Pilgrim 8658d20724 [DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC. 2021-11-15 10:43:55 +00:00
Sanjay Patel 254c5246e9 [DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit
This was noted as a follow-up to D113212 / D113426:
4fc1fc4005
7e30404c3b
11522cfcad

https://alive2.llvm.org/ce/z/e4o96b

The canonicalization rules for these IR patterns are complicated,
and we were not matching the expected forms in 2 out of the 3
cases. We can make codegen more robust by matching the swapped
forms (and that will also work if these patterns are created late).
2021-11-14 09:35:26 -05:00
Kazu Hirata 99d5cbbd7e [CodeGen] Use SDNode::uses (NFC) 2021-11-12 07:33:29 -08:00
Simon Pilgrim 010b09b0c5 [DAG] reassociateOpsCommutative - test getNode result directly. NFC
Matches the clean code style we use directly above
2021-11-11 18:45:50 +00:00
Sanjay Patel 11522cfcad [DAGCombiner] add fold for vselect based on mask of signbit, part 3
(Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1

https://alive2.llvm.org/ce/z/mGCBrd

This was suggested as a potential enhancement in D113212 (also 7e30404c3b ).
There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ).
For x86, we have a counter-acting fold for most cases that turns the shift+not
back into a setcc, so that needs a work-around to get more cases to use "pandn":
D113603

Note that this pattern (and a previous one) are not currently canonical forms
in IR:
https://alive2.llvm.org/ce/z/e4o96b

Adding swapped variants is left as a TODO item here, but is planned as
a near-term follow-up patch.

Differential Revision: https://reviews.llvm.org/D113426
2021-11-11 10:27:37 -05:00
Simon Pilgrim 82b74363a9 [DAG] reassociateOpsCommutative - peek through bitcasts to find constants
Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants
2021-11-11 12:00:22 +00:00
Simon Pilgrim 381d14775e [DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC. 2021-11-10 15:19:13 +00:00
Simon Pilgrim f059b04f7b [DAG] Add SelectionDAG::ComputeMinSignedBits helper
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.

I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.

Differential Revision: https://reviews.llvm.org/D113396
2021-11-08 14:12:45 +00:00
Simon Pilgrim f60d3ec0c7 [DAG] Add BuildVectorSDNode::getConstantRawBits helper
We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.

This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.

Differential Revision: https://reviews.llvm.org/D113351
2021-11-08 12:07:38 +00:00
Simon Pilgrim 0ff1edeeec [DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
2021-11-07 12:11:46 +00:00
Sanjay Patel 39c4c7d391 [DAGCombiner] remove vselect fold that was accidentally added
This diff snuck into the unrelated:
025a2f73a3

It's a suggested follow-up for D113212, but I need to add test
coverage first.
2021-11-06 09:34:30 -04:00
Sanjay Patel 025a2f73a3 [InstCombine] add tests for umax with sub; NFC 2021-11-06 08:32:52 -04:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Simon Pilgrim 9e6506299a [DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.

We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.

Differential Revision: https://reviews.llvm.org/D113276
2021-11-05 14:36:17 +00:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Simon Pilgrim a5f56342b0 [DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
2021-10-22 18:32:14 +01:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Sanjay Patel d2198771e9 [DAGCombiner] fold bit-hack form of usubsat
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085
2021-10-21 09:47:19 -04:00
Arthur Eubanks 6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Mingming Liu cfd155c41b [SelectionDAG] Fix typo in option help
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
2021-10-15 11:27:40 -07:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Wang, Pengfei c236883b6b [X86] Optimize fdiv with reciprocal instructions for half type
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110557
2021-10-08 09:41:13 +08:00
David Sherwood 37edb7d3e2 [SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors
We were previously silently generating incorrect code when extracting a
fixed-width vector from a scalable vector. This is worse than crashing,
since the user will have no indication that this is currently unsupported
behaviour. I have fixed the code to only perform DAG combines when safe
to do so, i.e. the input and output vectors are both fixed-width or
both scalable.

Test added here:

  CodeGen/AArch64/sve-extract-scalable-vector.ll

Differential revision: https://reviews.llvm.org/D110624
2021-10-06 09:27:44 +01:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Simon Pilgrim df672f66b6 [DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only).

Differential Revision: https://reviews.llvm.org/D110486
2021-10-01 21:07:34 +01:00
Fraser Cormack e2b46e336b [DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.

This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.

However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.

It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:

```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```

This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109148
2021-09-27 11:30:09 +01:00
Simon Pilgrim 18c8ed5416 [DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.
2021-09-25 18:35:57 +01:00
Simon Pilgrim 6bd5b1b1ce [DAG] combineShiftToMULH - move getValueType() inside assert. NFCI.
Avoids an unnecessary (void).
2021-09-25 11:56:35 +01:00
Bjorn Pettersson c3ae8ecb52 [DAGCombiner] Rename isAlias as mayAlias. NFC
Differential Revision: https://reviews.llvm.org/D110062
2021-09-23 09:54:42 +02:00
Michael Liao 5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
David Truby 915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Craig Topper 705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Carl Ritson 5d9de3ea18 [DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108619
2021-08-27 19:49:35 +09:00
Sanjay Patel e728d1a3e8 [DAGCombiner] create binop nodes with all of expected values
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.

That example was suppressed with:
985b48f183

...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
2021-08-25 16:14:22 -04:00
Sanjay Patel 985b48f183 [DAGCombiner] check uses more strictly on select-of-binop fold
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.

Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612

The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.

Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
2021-08-25 14:14:41 -04:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Simon Pilgrim 194b08000c [DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
2021-08-24 15:28:30 +01:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Simon Pilgrim e431b280c9 [DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.

Differential Revision: https://reviews.llvm.org/D108307
2021-08-24 12:31:22 +01:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Paul Walker cd0e196413 [DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation.
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations.  This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.

Differential Revision: https://reviews.llvm.org/D108086
2021-08-15 18:25:49 +01:00
Luo, Yuanke 53642d5b80 [NFC] Fix the formula for reciprocal calculation.
Differential Revision: https://reviews.llvm.org/D107713
2021-08-09 16:03:56 +08:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Craig Topper f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Craig Topper c23405174a [DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107486
2021-08-04 11:39:23 -07:00
Simon Pilgrim 11396641e4 [DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI.
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.

Noticed while triaging PR45116
2021-08-03 13:47:55 +01:00
Sanjay Patel fa6b2c9915 [DAGCombiner] don't try to partially reduce add-with-overflow ops
This transform was added with D58874, but there were no tests for overflow ops.
We need to change this one way or another because it can crash as shown in:
https://llvm.org/PR51238

Note that if there are no uses of an overflow op's bool overflow result, we
reduce it to a regular math op, so we continue to fold that case either way.
If we have uses of both the math and the overflow bool, then we are likely
not saving anything by creating an independent sub instruction as seen in
the test diffs here.

This patch makes the behavior in SDAG consistent with what we do in
instcombine AFAICT.

Differential Revision: https://reviews.llvm.org/D106983
2021-07-29 08:51:54 -04:00
Juneyoung Lee 4f71f59bf3 [DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND
This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.

Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105344
2021-07-28 09:22:15 +09:00
Fraser Cormack 7b33b849bd [SelectionDAG] Support scalable splats in U(ADD|SUB)SAT combines
This patch builds on top of D106575 in which scalable-vector splats were
supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how
to perform a variety of the pre-existing saturating add/sub combines on
scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106652
2021-07-27 10:52:34 +01:00
Fraser Cormack f924a3d474 [SelectionDAG] Support scalable-vector splats in yet more cases
This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.

Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106575
2021-07-26 10:15:08 +01:00
Simon Pilgrim c261a06b7a [DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129)
I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place.

SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added.

Differential Revision: https://reviews.llvm.org/D106668
2021-07-24 11:36:35 +01:00
David Truby 1528a4d400 [llvm][sve] Lowering for VLS truncating stores
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471
2021-07-23 14:04:55 +01:00
Paulo Matos 46667a1003 [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104797
2021-07-22 22:07:24 +02:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Amy Huang fd972bb9fd Revert "[llvm][sve] Lowering for VLS truncating stores" because it
causes a seg fault (see https://reviews.llvm.org/D104471).

This reverts commit c305557acd.
2021-07-19 11:03:33 -07:00
Simon Pilgrim fd7a54c709 [DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop
As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.
2021-07-18 18:38:59 +01:00
Simon Pilgrim 5643be96bc [DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls 2021-07-18 18:38:59 +01:00
Simon Pilgrim 1a6a8443c2 [DAG] Move select(cc, binop(), binop()) folds into DAGCombiner::foldSelectOfBinops. NFCI.
I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT
2021-07-18 14:54:41 +01:00
Simon Pilgrim 0aece73aba [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops.

I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp().

Differential Revision: https://reviews.llvm.org/D106058
2021-07-15 16:08:30 +01:00
Qiu Chaofan 954a15d639 [SelectionDAG] Check use before combining into USUBSAT
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105789
2021-07-13 14:50:26 +08:00
David Truby c305557acd [llvm][sve] Lowering for VLS truncating stores
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471
2021-07-12 11:14:17 +01:00
David Green 4ce26deac2 [DAG] Reassociate Add with Or
We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.

Differential Revision: https://reviews.llvm.org/D104765
2021-07-07 10:21:07 +01:00
David Stuttard 83cb9632a1 [DAGCombiner] Add support for mulhi const folding in DAGCombiner
Differential Revision: https://reviews.llvm.org/D103323

Change-Id: I4ffaaa32301795ba8a339567a68e77fe0862b869
2021-07-05 12:01:26 +01:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Craig Topper af331e8284 [SelectionDAG] Rename memory VT argument for getMaskedGather/getMaskedScatter from VT to MemVT.
Use getMemoryVT() in MGATHER/MSCATTER DAG combines instead of
using the passthru or store value VT for this argument.
2021-07-02 17:37:40 -07:00
Roman Lebedev c2c0d3ea89
Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 4facbf213c.

```
********************
FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468)
******************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types | /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed.

```
2021-07-02 11:49:51 +03:00
Paulo Matos 4facbf213c [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Differential Revision: https://reviews.llvm.org/D104797
2021-07-02 09:46:28 +02:00
David Green 2887f14639 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
David Green b8c8bb0769 [DAG] Fold neg(splat(neg(x)) -> splat(x)
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.

Differential Revision: https://reviews.llvm.org/D103755
2021-06-25 19:53:29 +01:00
Jinsong Ji c125af82a5 [DAGCombine] Check reassoc flags in aggressive fsub fusion
The is from discussion in https://reviews.llvm.org/D104247#inline-993387

The contract and reassoc flags shouldn't imply each other .

All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.

Reviewed By: mcberg2017

Differential Revision: https://reviews.llvm.org/D104723
2021-06-23 13:59:40 +00:00
Jinsong Ji 3996311ee1 [DAGCombine] reassoc flag shouldn't enable contract
According to IR LangRef, the FMF flag:

contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).

reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.

My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.

eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.

   %F = fmul reassoc double %A, %B         ; <double> [#uses=1]
   %G = fmul reassoc double %C, %D         ; <double> [#uses=1]
   %H = fadd reassoc double %F, %G         ; <double> [#uses=1]
   %I = fsub reassoc double %H, %E         ; <double> [#uses=1]

Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.

The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.

Reviewed By: spatel, qiucf

Differential Revision: https://reviews.llvm.org/D104247
2021-06-21 21:15:43 +00:00
Saleem Abdulrasool 5b5833b9e0 SelectionDAG: repair the Windows build
6e5628354e regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction.  This uses a bit more compiler magic to deal with that.
2021-06-14 08:25:36 -07:00
Roman Lebedev 0f94c3c80d
[NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function
Not all supported compilers like such lambdas, at least one buildbot is unhappy.
2021-06-14 16:25:59 +03:00
Roman Lebedev 6e5628354e
[DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size
The sorting, obviously, must be stable, else we will have random assembly fluctuations.

Apparently there was no test coverage that would benefit from that,
so i've added one test.

The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104187
2021-06-14 16:18:37 +03:00
Carl Ritson cfbb92441f [SDAG] Fix pow2 assumption when splitting vectors
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.

This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103880
2021-06-11 08:58:16 +09:00
David Spickett 64de8763aa Revert "Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 31859f896c.

Causing SVE and RISCV-V test failures on bots.
2021-06-10 10:11:17 +00:00
Paulo Matos 31859f896c Implementation of global.get/set for reftypes in LLVM IR
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D95425
2021-06-10 10:07:45 +02:00
Sanjay Patel dd763ac791 [SDAG] fix miscompile from merging stores of different sizes
As shown in:
https://llvm.org/PR50623
...and the similar tests here, we were not accounting for
store merging of different sizes that do not cover the
entire range of the wide value to be stored.

This is the easy fix: just make sure that all of the
original stores are the same size, so when we calculate
the wide width, it's a simple N * M check.

This still allows all of the motivating optimizations from:
D86420 / 54a5dd485c
D87112 / 7a06b166b1

We could enhance this code to track individual bytes and
allow merging multiple sizes.
2021-06-09 09:51:39 -04:00
Simon Pilgrim 61a2d6bfe4 [DAG] foldShuffleOfConcatUndefs - ensure shuffles of upper (undef) subvector elements is undef (PR50609)
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))

If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.

Fixes PR50609
2021-06-08 15:49:41 +01:00
Sanjay Patel 0718ac706d [SDAG] allow cast folding for vector sext-of-setcc with signed compare
This extends 434c8e013a and ede3982792 to handle signed
predicates by sign-extending the setcc operands.

This is not shown directly in https://llvm.org/PR50055 ,
but the pattern is visible by changing the unsigned convert
to signed in the source code.
2021-06-02 15:05:02 -04:00
Sanjay Patel ede3982792 [SDAG] allow more cast folding for vector sext-of-setcc
This is a follow-up to D103280 that eases the use restrictions,
so we can handle the motivating case from:
https://llvm.org/PR50055

The loop code is adapted from similar use checks in
ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an
easier way to filter out non-chain uses of load values.

Differential Revision: https://reviews.llvm.org/D103462
2021-06-02 13:14:49 -04:00
Sanjay Patel 1b14f3951a [SDAG] add helper function for sext-of-setcc folds; NFC
Try to make this easier to read as noted in D103280
2021-06-01 08:07:17 -04:00
Sanjay Patel 63fe4cb082 [SDAG] add check to sext-of-setcc fold to bypass changing a legal op
I accidentaly pushed a draft of D103280 that was discussed
during the review, but it was not supposed to be the final
version.

Rather than revert and recommit, I'm updating the existing
code. This way we have a record of the codegen diff that
would result if we decide to remove this predicate in the
future.
2021-05-31 08:58:11 -04:00
Sanjay Patel 434c8e013a [SDAG] try harder to fold casts into vector compare
sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) --
(when the zexts are free and a bunch of other conditions)

We have a couple of similar folds to this already for vector selects,
but this pattern slips through because it is only a setcc.

The tests are based on the motivating case from:
https://llvm.org/PR50055
...but we need extra logic to get that example, so I've left that as
a TODO for now.

Differential Revision: https://reviews.llvm.org/D103280
2021-05-31 07:14:01 -04:00
Florian Hahn 126f90b252
[DAGCombine] Poison-prove scalarizeExtractedVectorLoad.
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.

To avoid introducing new UB, we can mask the index so it only contains
valid indices.

Fixes PR50382.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103077
2021-05-30 11:40:55 +01:00
Fraser Cormack b7101e218c [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack 85e31eddf2 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00
Sanjay Patel 9dfd7f9b67 [SDAG] reduce code duplication for extend_vec_inreg combines; NFC
These are identical so far, and I was looking at adding a fold
for a pattern with scalar_to_vector which would also nd up duplicated.
2021-05-14 08:29:57 -04:00
Hendrik Greving 762ac725bf [DAGCombiner] Fix DAG combine store elimination, different address space.
Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.

%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2

As well as

store %v, %ptr_as1
store %v, %ptr_as2

Fixes a test for above in X86.

Differential Revision: https://reviews.llvm.org/D102096
2021-05-12 07:14:22 -07:00
Bradley Smith 635164b95a [AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990
2021-05-10 14:43:53 +01:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Jun Ma 978eb3f168 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Fraser Cormack c141bd3cf9 [DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100851
2021-04-21 11:05:37 +01:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Jun Ma 7e1422c1e4 [DAGCombiner] Fold step_vector with add/mul/shl
This patch implements some DAG combines for STEP_VECTOR:
add step_vector(C1), step_vector(C2) -> step_vector(C1+C2)
add (add X step_vector(C1)), step_vector(C2) -> add X step_vector(C1+C2)
mul step_vector(C1), C2 -> step_vector(C1*C2)
shl step_vector(C1), C2 -> step_vector(C1<<C2)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100088
2021-04-15 18:06:35 +08:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Simon Pilgrim 77d625f8d8 [DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.

This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
2021-04-01 14:33:00 +01:00
Florian Hahn eb3d9f2eb6
[SelDag] Add isIntOrFPConstant helper function.
This patch adds a new isIntOrFPConstant  helper function to check if a
SDValue is a integer of FP constant. This pattern is used in various
places.

There also are places that incorrectly just check for integer constants,
e.g. D99384, so hopefully this helper will help people avoid that issue.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D99428
2021-03-28 12:48:58 +01:00
Craig Topper 30080b003e [DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization.
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)

So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
2021-03-21 11:16:41 -07:00
Simon Pilgrim 64c2641c89 [DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
2021-03-21 14:01:37 +00:00
Simon Pilgrim ffb2887103 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w))
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.

Differential Revision: https://reviews.llvm.org/D98646
2021-03-19 14:14:56 +00:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Simon Pilgrim 1ba5c550d4 [DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x)
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.

Noticed while investigating the poor codegen in D98587.
2021-03-18 15:34:53 +00:00
Craig Topper 5b825433d7 [DAGCombiner] Optimize 1-bit smulo to AND+SETNE.
A 1-bit smulo overflows is both inputs are -1 since the result
should be +1 which can't be represented in a signed 1 bit value.

We can detect this with an AND and a setcc. The multiply result
can also use the same AND.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97634
2021-03-13 09:39:36 -08:00
Craig Topper 2ea7014089 [DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD.
This allows us to optimize when the mask is a splat_vector in
addition to build_vector.
2021-03-12 12:14:56 -08:00
Sanjay Patel 415c67ba4c [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a2
2021-03-02 14:29:15 -05:00
Sanjay Patel 7fce3322a2 [SDAG] allow vector types for select->logic folds
This prepares codegen for a change that will remove the identical
folds from IR because they are not poison-safe. See
D93065 / D97360
for details.

We already generically support scalar types, and there are various
target-specific transforms that overlap the vector folds. For example,
x86 recognizes the and patterns, but not or. We can end up with 1
extra instruction there, but I think that is still preferred over the
blendv alternative that loads a constant vector.

If this is not optimal, then it should be fixed with a later transform
(this change is not expected to result in any regressions because
InstCombine currently does the same thing).

Removing custom code and supporting undefs in constant-pattern-matching
can be follow-up changes.

Differential Revision: https://reviews.llvm.org/D97730
2021-03-02 09:25:10 -05:00
Simon Pilgrim c0d4b44e6a [DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI.
Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.
2021-03-02 13:29:31 +00:00
Sanjay Patel 154c47dc06 [SDAG] add helper for select->logic folds; NFC
This set of transforms should be extended to handle vector types.
2021-03-01 16:24:15 -05:00
Simon Pilgrim 9dd83f5ee8 [DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle.
Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.
2021-03-01 10:42:11 +00:00
Simon Pilgrim 64c41301ce [DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI.
Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.
2021-03-01 09:42:00 +00:00
Craig Topper 5de09ef02e [DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg.
Peeking through AND is only valid if the input to both shifts is
the same. If the inputs are different, then the original pattern
ORs the two values when the masked shift amount is 0. This is ok
if the values are the same since the OR would be a NOP which is
why its ok for rotate.

Fixes PR49365 and reverts PR34641

Differential Revision: https://reviews.llvm.org/D97637
2021-02-28 12:58:00 -08:00
Craig Topper ca5247bb17 [DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits.
Even if the first computeKnownBits call doesn't have any zero
bits it is possible the other operand has bitwidth-1 leading zero.
In that case overflow is still impossible. So always call computeKnownBits
for both operands.
2021-02-28 08:26:22 -08:00
Craig Topper eea53b142d [DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible.
Using ComputeNumSignBits or computeKnownBits we might be able
to determine that overflow is impossible.

This especially helps after type legalization if the type was
promoted from a type with half the bits or more. Type legalization
conservatively creates a promoted smulo/umulo and an overflow
check for the promoted bits. The overflow from the promoted
smulo/umulo is ORed with the result of the promoted bits
overflow check. Proving that the promoted smulo/umulo can never
overflow will leave us with just the promoted bits overflow check.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97160
2021-02-26 14:50:03 -08:00
Simon Pilgrim 9490b9f14b [DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode()
As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.
2021-02-25 17:49:26 +00:00
Simon Pilgrim 8082bfe7e5 [DAG] Add basic mul-with-overflow constant folding support
As noticed on D97160
2021-02-24 11:09:02 +00:00
Simon Pilgrim 38ab47c813 [DAG] Match USUBSAT patterns through zext/trunc
This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form:

zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))
zext(x) >  y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit)))

Based on original examples:

void foo(unsigned short *p, int max, int n) {
    int i;
    unsigned m;
    for (i = 0; i < n; i++) {
        m = *--p;
        *p = (unsigned short)(m >= max ? m-max : 0);
    }
}

Differential Revision: https://reviews.llvm.org/D25987
2021-02-21 15:26:54 +00:00
Simon Pilgrim 761bbed264 [DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit)))
This moves the last custom x86 USUBSAT fold to generic DAGCombine.

Completes PR40111

Differential Revision: https://reviews.llvm.org/D96703
2021-02-20 12:02:07 +00:00
Simon Pilgrim 5d3930bb8f [DAG] visitTRUNCATE - attempt to truncate USUBSAT
Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))
2021-02-19 14:26:05 +00:00
Simon Pilgrim 53e83afcaf [DAG] getTruncatedUSUBSAT - always truncate operands. NFCI.
As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.
2021-02-18 21:28:55 +00:00
Guozhi Wei 66f2d09ebf [DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
If extload is legal, following transform
    (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
can save one ext instruction.

Differential Revision: https://reviews.llvm.org/D95086
2021-02-18 13:15:20 -08:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Simon Pilgrim 87fbc06d06 [DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI.
This will simplify an incoming generic implementation of D25987.

I'll rebase D96703 shortly to support this.
2021-02-17 12:17:08 +00:00
Simon Pilgrim 05c64ea672 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED)
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-17 11:42:43 +00:00
Sterling Augustine 5aa8f4c084 Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))"
This reverts commit 5dfba562dd.

That commit causes an assertion failure with the following repro:

typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
  return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
  b k, l, m, n, o[6], p, q;
  m = d[5];
  b r = m;
  b s = f(r, 8);
  q = s;
  l = d[1];
  p = l;
  t(q);
  n = c(m, l);
  o[1] = c(s, f(p, 8));
  k = __builtin_shufflevector(n, o[1], 0, 2);
  e = __builtin_ia32_psrlwi128(k, j);
}

./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c
2021-02-16 12:48:15 -08:00
Simon Pilgrim 5dfba562dd [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Differential Revision: https://reviews.llvm.org/D96345
2021-02-16 15:46:34 +00:00
Simon Pilgrim e47f21da61 [DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI.
This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.
2021-02-15 18:27:00 +00:00
Simon Pilgrim 6f5a805bbb [DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y)
Alive2: https://alive2.llvm.org/ce/z/FzcrpH
2021-02-13 15:02:01 +00:00
Simon Pilgrim 0df15e5eff [DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y)
Alive2: https://alive2.llvm.org/ce/z/4nkNGh
2021-02-13 13:21:15 +00:00
Simon Pilgrim 4841a225b7 [DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine
Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets.

This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization.

We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation.

The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987.

Differential Revision: https://reviews.llvm.org/D96413
2021-02-12 18:22:57 +00:00
Simon Pilgrim 5beebf9c58 [DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI.
Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.
2021-02-11 17:09:01 +00:00
Luís Marques acac29ca42 [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts
Avoid doing the following combine for vector types:

```
copysign(x, fp_extend(y)) -> copysign(x, y)
copysign(x, fp_round(y)) -> copysign(x, y)
```

That combine seemed to impede the selection of vector instruction and cause
a mess in some circumstances.

Differential Revision: https://reviews.llvm.org/D96037
2021-02-10 14:25:24 +00:00
Kazu Hirata 7e75f6fc1d [SelectionDAG] Use range-based for loops (NFC) 2021-02-09 22:14:30 -08:00
Nemanja Ivanovic a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Simon Pilgrim c5c690a835 [DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI.
This is going to be necessary for a future reuse of MergeInnerShuffle
2021-02-08 14:25:16 +00:00
Huihui Zhang 1b81117f88 [DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA.
Make sure scalable property is preserved by using getVectorElementCount().

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D95967
2021-02-05 09:56:49 -08:00
Craig Topper 34da12dd1f [DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits.
If sext_inreg is supported, we will turn this into sext_inreg. That
will then remove it if there are enough sign bits. But if sext_inreg
isn't supported, we can still remove the shift pair based on sign
bits.

Split from D95890.
2021-02-03 10:18:40 -08:00
Fraser Cormack fde2466171 [SelectionDAG] Support scalable-vector splats in more cases
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94501
2021-01-25 10:58:15 +00:00
QingShan Zhang ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Simon Pilgrim 5dbe5d2c91 [DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u))
We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.
2021-01-22 11:43:18 +00:00
Simon Pilgrim 69bc0990a9 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Simon Pilgrim bc9ab9a5cd [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.
Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.
2021-01-21 11:04:09 +00:00
Hans Wennborg a51226057f Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d69.
2021-01-20 20:06:55 +01:00
Simon Pilgrim cad4275d69 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Kazu Hirata b023cdeacc [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Kazu Hirata 8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
Simon Pilgrim 207f32948b [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops
Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other.

Differential Revision: https://reviews.llvm.org/D94532
2021-01-18 10:29:23 +00:00
Simon Pilgrim 46aa3c6c33 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging
MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops.

However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like:

shuffle(shuffle(x,y),shuffle(x,y))
shuffle(shuffle(shuffle(x,z),y),z)
shuffle(shuffle(x,shuffle(x,y)),z)
etc.

This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet.

This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests.

Differential Revision: https://reviews.llvm.org/D94671
2021-01-15 15:08:31 +00:00
Simon Pilgrim 7c30c05ff7 [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI.
I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run.

Also, move the second op matching before bailing to make it simpler to try to match other things afterward.
2021-01-14 11:55:20 +00:00
Simon Pilgrim af8d27a7a8 [DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI.
Make it easier to reuse in a future patch.
2021-01-14 11:05:19 +00:00
Kazu Hirata 5c1c39e8d8 [llvm] Use *Set::contains (NFC) 2021-01-13 19:14:41 -08:00
Simon Pilgrim 993c488ed2 [DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI. 2021-01-13 17:19:41 +00:00
Juneyoung Lee 25eb7b08ba [DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND)
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.

To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).

Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92015
2021-01-13 09:36:52 +09:00
Craig Topper df74c001fa [DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC 2021-01-11 23:41:40 -08:00
Joe Ellis 007358239d [DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR
This avoids TypeSize-/ElementCount-related warnings.

Differential Revision: https://reviews.llvm.org/D92747
2021-01-11 14:15:11 +00:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Craig Topper 4ef91f5871 [DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used.
This looks to have been done to save some duplicated code under
two different if statements, but it ends up being harmful to D94073.
This speculative constant can be called on a scalable vector type
with i64 element size when i64 scalars aren't legal. The code tries
and fails to find a vector type with i32 elements that it can use.

So only create the node when we know it will be used.
2021-01-05 12:45:57 -08:00
Cameron McInally 92be640bd7 [FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed
This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags.

Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately

Differential Revision: https://reviews.llvm.org/D93243
2021-01-04 14:44:10 -06:00
Layton Kifer d29f93bda5 [DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced
Fixes a bug introduced by D91589.

When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced.

Thanks @jgorbe, for finding the below reproducer.

The following reduced test case crashes clang when built with `clang -O1 -frounding-math`:

```
template <class> class a {
  int b() { return c == 0.0 ? 0 : -1; }
  int c;
};
template class a<long>;
```

A debug build of clang produces this "assertion failed" error:
```
clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93274
2020-12-23 16:16:26 -08:00
Layton Kifer 385e9a2a04 [DAGCombiner] Improve shift by select of constant
Clean up a TODO, to support folding a shift of a constant by a
select of constants, on targets with different shift operand sizes.

Reviewed By: RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D90349
2020-12-18 02:21:42 +00:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Kai Luo 44bd8ea167 [DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation
Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation.

Verification: https://alive2.llvm.org/ce/z/vpquFR

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92637
2020-12-08 03:24:07 +00:00
Simon Pilgrim b6e847c396 [DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI. 2020-12-07 18:23:54 +00:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Bing1 Yu eee30a6dce [CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942.
In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D92548
2020-12-07 08:49:07 +08:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Simon Pilgrim 6f4ee6f870 [DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI.
Avoid unnecessary instantiation.

Noticed while removing unnecessary autos
2020-12-04 09:44:58 +00:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Joe Ellis 78c0ea54a2 [DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END
Bail out early if we encounter a scalable store.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D92392
2020-12-03 12:12:41 +00:00
Layton Kifer d7fec38f05
[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D92246
2020-12-01 22:23:04 +03:00
Benjamin Kramer 107e92dff8 [DAG] Remove unused variable. NFC. 2020-12-01 16:29:02 +01:00
Simon Pilgrim 1b209ff9e3 [DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.
2020-12-01 14:25:29 +00:00
Simon Pilgrim 6dbd0d36a1 [DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Differential Revision: https://reviews.llvm.org/D91876
2020-12-01 11:56:26 +00:00
QingShan Zhang 4d83aba422 [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D80974
2020-11-27 02:10:55 +00:00
QingShan Zhang 9c588f53fc [DAGCombine] Add hook to allow target specific test for sqrt input
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.

Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D80706
2020-11-25 05:37:15 +00:00
Kai Luo 5931be60b5 [DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops
This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91120
2020-11-24 09:43:35 +00:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
Francesco Petrogalli fc2fe6817e [llvm][AArch64] Simplify (and (sign_extend..) #bitmask).
Fold

    VT = (and (sign_extend NarrowVT to VT) #bitmask)

into

    VT = (zero_extend NarrowVT)

With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.

  BEFORE                       |  AFTER
    f_i16_i32:                 |    f_i16_i32:
         .fnstart              |           .fnstart
         ldrsh   r0, [r0]      |           ldrh    r1, [r1]
         ldrsh   r1, [r1]      |           ldrsh   r0, [r0]
         smulbb  r0, r1, r0    |           smulbb  r0, r0, r1
         uxth    r1, r1        |           mul     r0, r0, r1
         mul     r0, r0, r1    |           bx      lr
         bx      lr            |

Reviewed By: resistor

Differential Revision: https://reviews.llvm.org/D90605
2020-11-09 12:53:36 +00:00
Fraser Cormack f99580c1e5 [DAGCombine] Fix bug in load scalarization
Summary:
For vector element types which are not byte-sized, we would generate
incorrect scalar offsets and produce incorrect codegen.

This optimization could potentially be supported in the future, e.g. by
loading in bytes, then shifting and masking out the remaining bits of
the vector element. However, without an upstream target to test against
it's best to avoid the bad codegen in the simplest possible way.

Related to this bug:

  https://bugs.llvm.org/show_bug.cgi?id=27600

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D78568
2020-11-04 19:02:40 +00:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Simon Pilgrim f53d7f55f1 [DAG] Move canFoldInAddressingMode before foldBinOpIntoSelect. NFC.
Reduces the diff in D90113.
2020-10-28 12:16:05 +00:00
Peter Waller 5b742a0c10 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in sve-redundant-store.ll that the warning is not triggered.

Differential Revision: https://reviews.llvm.org/D89701
2020-10-26 16:37:48 +00:00
Peter Waller 6536d6040f Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination"
This reverts commit 4604441386.

Reverting because it was not the intended version of the patch, which
follows this patch.
2020-10-26 16:37:00 +00:00
Peter Waller 4604441386 [SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination
The modified code in visitSTORE was missing a scalable vector check, and still
using the now deprecated implicit cast of TypeSize to uint64_t through the
overloaded operator. This patch fixes these issues.

This brings the logic in line with the comment on the context line immediately
above the added precondition.

Add a test in Redundantstores.ll that the warning is not triggered.
2020-10-26 16:23:42 +00:00
Qiu Chaofan 1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
David Sherwood 3945b69e81 [SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. This patch changes a few
functions that were always expecting to work on scalar or fixed width
types:

1. DAGCombiner::mergeTruncStores - deals with scalar integers only.
2. DAGCombiner::ReduceLoadWidth - not valid for vectors.
3. DAGCombiner::createBuildVecShuffle - should only be used for
   fixed width vectors.
4. SelectionDAGLegalize::ExpandFCOPYSIGN and
   SelectionDAGLegalize::getSignAsIntValue - only work on scalars.

Differential Revision: https://reviews.llvm.org/D88562
2020-10-19 08:38:50 +01:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
Craig Topper 1687a8d83b [X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.

Alternative to D89200

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D89222
2020-10-12 23:18:29 -07:00
David Sherwood c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
Esme-Yi e9fd8823ba [DAGCombiner] Add decomposition patterns for Mul-by-Imm.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
  mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
  mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88201
2020-10-09 08:51:40 +00:00
David Sherwood 4ed47d50ea [SVE][CodeGen] Fix DAGCombiner::ForwardStoreValueToDirectLoad for scalable vectors
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:

  CodeGen/AArch64/sve-forward-st-to-ld.ll

Differential Revision: https://reviews.llvm.org/D87098
2020-10-06 08:04:03 +01:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Sanjay Patel 2ccbf3dbd5 [SDAG] fold x * 0.0 at node creation time
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.

By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
2020-10-04 11:31:57 -04:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Simon Pilgrim a61272a900 [DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask
If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).

This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.

Differential Revision: https://reviews.llvm.org/D88225
2020-09-26 14:31:57 +01:00
Qiu Chaofan c0f8e4c06c [SelectionDAG] Add guard to automatically insert flags
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.

In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87361
2020-09-26 13:57:52 +08:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Craig Topper e30371d99d [DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store.
Similar to what done in D87788 for MLOAD.

Again I've skipped indexed, truncating, and compressing stores.
2020-09-16 16:42:22 -07:00
Craig Topper 89ee4c0314 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Simon Pilgrim 3f682611ab [DAG] Remover getOperand() call. NFCI. 2020-09-16 11:18:58 +01:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Nikita Popov 8e69c3cde8 [DAGCombiner] Fold fmin/fmax with INF / FLT_MAX
Similar to D87415, this folds the various float min/max opcodes
with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand
if the ninf flag is set. Some of the folds are only possible under
nnan.

The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases
are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86,
the rest is here for the sake of completeness.

Differential Revision: https://reviews.llvm.org/D87571
2020-09-14 19:59:33 +02:00
Craig Topper 56b33391d3 [SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits.
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.

So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
2020-09-13 21:04:13 -07:00
Qiu Chaofan a4c5351986 [DAGCombiner] Propagate FMF flags in FMA folding
DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't
propagated into new fadd. This patch fixes that.

Some code in visitFMA is redundant and such support for vector constants
is missing. Need follow-up patch to clean.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87037
2020-09-14 00:19:06 +08:00
Craig Topper ad3d6f993d [SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.

This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.

I've avoided vectors in this patch to avoid more legalization
complexity for this patch.

X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.

Fixes PR47433

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87209
2020-09-12 11:42:18 -07:00