Commit Graph

3425 Commits

Author SHA1 Message Date
Sven van Haastregt c8d91b07bb Reassoc FMF should not optimize FMA(a, 0, b) to (b)
Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not
NaN. DAGCombiner will do this optimization when the reassoc fast math
flag is set, which is not correct. Change DAGCombiner to only consider
UnsafeMath for this optimization.

Differential Revision: https://reviews.llvm.org/D130232

Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>
2022-07-26 09:39:12 +01:00
jacquesguan cb370cf413 [DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat.
This patch supports the scalable splat part for scalarizeExtractedBinop.

Differential Revision: https://reviews.llvm.org/D129725
2022-07-26 09:31:45 +08:00
Simon Pilgrim 562ee7cc5f [DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS 2022-07-24 16:09:56 +01:00
Simon Pilgrim 5f89d2bae9 [DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).

It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
2022-07-23 13:17:24 +01:00
Craig Topper be208b40c1 [DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC
We were looking for loads or any_extend+load. reduceLoadWidth
hasn't known how to look through such an any_extend to find the
load since D40667 almost 5 years ago.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130333
2022-07-22 08:36:56 -07:00
Cullen Rhodes bf268a05cd [AArch64] Emit vector FP cmp when LE is used with fast-math
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130093
2022-07-22 07:53:55 +00:00
jacquesguan e60eb7053d recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
With fix for AArch64 and Hexgon test cases.
2022-07-21 17:34:34 +08:00
David Truby 4c82f56d8f [llvm][SVE] Remove redundant and when comparing against extending load
When determining if an `and` should be merged into an extending load
the constant argument to the `and` is currently not checked if the
argument requires truncation. This prevents the combine happening when
the vector width is half the normal available vector width for SVE VLA
vectors.

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D129281
2022-07-19 17:08:32 +01:00
Simon Pilgrim 71c502cbca [DAG] Call SimplifyDemandedBits from ISD::MUL nodes
Noticed while triaging D129765.
2022-07-19 14:11:04 +01:00
Max Kazantsev 69b284aaf6 Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
This reverts commit 58dfaaaace.

Massive AARCH test failures in buildbot.
2022-07-19 13:41:52 +07:00
jacquesguan 58dfaaaace [DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat.
This revision supports to scalarize a binary operation of two scalable splat vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122791
2022-07-19 11:20:51 +08:00
Itay Bookstein 2570f226d1 [SDAG] Remove single-result restriction on commutative CSE
The DAG Combiner unnecessarily restricts commutative CSE
to nodes with a single result value. This commit removes
that restriction.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D129666
2022-07-18 19:19:13 +03:00
Simon Pilgrim 53b90dd372 [DAG] Fold (or (and X, C1), (and (or X, Y), C2)) -> (or (and X, C1|C2), (and Y, C2))
Pulled out of D77804

Alive2: https://alive2.llvm.org/ce/z/g61VRe
2022-07-17 18:51:41 +01:00
Kazu Hirata 9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Sanjay Patel 7ca3e23f25 [SDAG] narrow truncated sign_extend_inreg
trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM

There are improvements on existing tests from this, and there are a pair
of large regressions in D127115 for Thumb2 caused by not folding this
pattern.

Differential Revision: https://reviews.llvm.org/D129890
2022-07-16 16:29:15 -04:00
Simon Pilgrim a44bdf9bc1 [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR creation from INSERT_VECTOR_ELT chain.
D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115.

The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain.

Fixes several regressions in D127115
2022-07-16 16:37:31 +01:00
Simon Pilgrim 52b6168c16 [DAG] visitINSERT_VECTOR_ELT - remove duplicate VT.getVectorNumElements() call. NFC. 2022-07-16 16:20:49 +01:00
Simon Pilgrim 2bb6b03d71 Fix signed/unsigned mismatch 2022-07-16 11:48:41 +01:00
Simon Pilgrim a5d0122f75 [DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero
As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero.

Differential Revision: https://reviews.llvm.org/D129150
2022-07-16 11:38:24 +01:00
Simon Pilgrim 1cb7416ee3 [DAG] combineShiftAnd1ToBitTest - match "and (srl (not X), C)), 1 --> (and X, 1<<C) == 0" patterns
combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift.

Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.
2022-07-16 11:00:07 +01:00
Simon Pilgrim 3c8bf29696 [DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts

As mentioned on D127115, we should be able to further generalise this based off the demanded bits.
2022-07-15 13:10:15 +01:00
Simon Pilgrim d172842b51 [DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results
If an element is known zero from both selections then it shouldn't matter what the selection mask element is.
2022-07-13 17:36:05 +01:00
Philip Reames fd67992f9c [DAGCombine] fold (urem x, (lshr pow2, y)) -> (and x, (add (lshr pow2, y), -1))
We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore.

Differential Revision: https://reviews.llvm.org/D129606
2022-07-13 08:34:38 -07:00
Sanjay Patel d0eec5f7e7 [SDAG] enhance sub->xor fold to ignore signbit
As suggested in the post-commit feedback for D128123,
we can ease the mask constraint to ignore the MSB
(and make the code easier to read by adjusting the check).

https://alive2.llvm.org/ce/z/bbvqWv
2022-07-11 12:37:50 -04:00
Kazu Hirata 1fd6611fc8 [SelectionDAG] Restore calls to has_value (NFC)
This patch restores calls to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.

This patch partially reverts d08f34b592.

Differential Revision: https://reviews.llvm.org/D129454
2022-07-10 14:37:23 -07:00
Craig Topper 40866b74bd [DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C)))
We already handled this case for add with a constant RHS. A
similar pattern can occur for sub with a constant left hand side.

Test cases use add and a mul representing (neg (shl X, C)) because
that's what I saw in the wild. The mul will be decomposed and then
the new transform can kick in.

Tests have not been committed, but this patch shows the changes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128769
2022-07-09 11:53:44 -07:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Simon Pilgrim 7068c843d2 [DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat
We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.
2022-07-05 16:44:31 +01:00
Simon Pilgrim e7a0fa4df0 [DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds
Noticed by inspection - the new shift is only ever used if the constant fold occurs
2022-07-05 16:44:31 +01:00
Simon Pilgrim cce64e7a9c [DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits.
Another cleanup step before removing GetDemandedBits entirely.
2022-07-04 11:25:40 +01:00
Kazu Hirata 94460f5136 Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 19:54:41 -07:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
chenglin.bi 8c74205642 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-24 23:15:06 +08:00
chenglin.bi 9c2bf534f5 Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate"
This reverts commit 6c951c5ee6.
2022-06-23 13:21:51 +08:00
Simon Pilgrim 1c2b756cd6 [DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC. 2022-06-21 22:07:41 +01:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
chenglin.bi 6c951c5ee6 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-21 09:45:19 +08:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Simon Pilgrim e4a124dda5 [DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m)
Similar to the existing (shl (srl x, c1), c2) fold

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125836
2022-06-20 08:37:38 +01:00
Craig Topper 314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Benjamin Kramer 8c4a07c61f [DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op 2022-06-15 19:54:39 +02:00
Simon Pilgrim f096d5926d [DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold
Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code.

Differential Revision: https://reviews.llvm.org/D127772
2022-06-15 11:07:59 +01:00
Simon Pilgrim 7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Simon Pilgrim 54ae4ca755 [DAG] visitSRL - pull out ShiftVT. NFC. 2022-06-12 14:02:23 +01:00
Simon Pilgrim cf5c63d187 [DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats
Addresses a number of regressions identified in D127115
2022-06-11 21:06:42 +01:00
Simon Pilgrim 44a0cd25df [DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling
Check if we're just replacing one v1x?? vector with another
2022-06-11 19:30:00 +01:00
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Simon Pilgrim 693f4db1ec [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI.
Remove the early-out cases so we can more easily add additional folds in the future.
2022-06-11 12:01:13 +01:00
Simon Pilgrim 7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Simon Pilgrim b84c10d4bc [DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT
Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again
2022-06-08 16:16:35 +01:00
Simon Pilgrim a083f3caa1 [DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements
As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first.

This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).
2022-06-07 16:42:24 +01:00
Guillaume Chatelet 0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
Nikita Popov 5a64bc207e [DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846)
These assert that there are no "useless" assertzext/assertsext nodes
(that assert a wider width than a following trunc), but I don't think
there is anything preventing such nodes from reaching this code.
I don't think the assertion is relevant for correctness of this
transform either -- if such an assert is present, then the other
one will always be to a smaller width, and we'll pick that one.
The assertion dates back to D37017.

Fixes https://github.com/llvm/llvm-project/issues/55846.

Differential Revision: https://reviews.llvm.org/D126952
2022-06-07 09:50:26 +02:00
Benjamin Kramer e8e4b741dd [DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores
Remove a few stray semicolons while there.
2022-06-03 13:28:34 +02:00
Nikita Popov ad742cf85d [DAGCombine] Handle promotion of shift with both operands the same
When promoting a shift, make sure we only fetch the second operand
after promoting the first. Load promotion may replace users of the
old load, and we don't want to be left with a dangling reference to
the old load instruction.

The crashing test case is from https://reviews.llvm.org/D126689#3553212.

Differential Revision: https://reviews.llvm.org/D126886
2022-06-03 10:00:44 +02:00
Ping Deng ae8ae45e2a [DAGCombine][NFC] Add braces to 'else' to match braced 'if'
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126624
2022-06-01 07:54:05 +00:00
Simon Pilgrim f366acdbf6 [DAG] Generalize (sra (trunc (sra x, c1)), c2) -> (trunc (sra x, c1 + c2)) constant folding
Remove local (uniform) constant folding and rely on getNode() to perform it

Minor cleanup step toward adding non-uniform shift amount support
2022-05-26 14:05:09 +01:00
Simon Pilgrim 7b617eef80 [DAG] Cleanup "and/or of cmp with single bit diff" fold to use ISD::matchBinaryPredicate
Prep work as I'm investigating some cases where TLI::convertSetCCLogicToBitwiseLogic should accept vectors.
2022-05-26 12:34:09 +01:00
Craig Topper 569d8945f3 [DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2.
If the VT is i2, then 2 is really -2.

Test has not been commited yet, but diff shows the change.

Fixes PR55644.

Differential Revision: https://reviews.llvm.org/D126213
2022-05-23 11:13:57 -07:00
Paul Walker 258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Craig Topper 46eef76876 [DAGCombiner] Fix bug in MatchBSwapHWordLow.
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.

If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.

Fixes PR55484.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125641
2022-05-18 09:23:18 -07:00
Simon Pilgrim d40b7f0d5a [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.

AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125607
2022-05-17 13:40:11 +01:00
Paul Walker 7dd05ba9ed [SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes.
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:

* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)

Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.

I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.

Depends On D123347

Differential Revision: https://reviews.llvm.org/D123381
2022-05-16 20:47:52 +01:00
Craig Topper e6fc8454be [DAGCombiner] Fix incorrect indentation. NFC 2022-05-16 09:27:15 -07:00
Bradley Smith 7ff5148d64 [DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine
Differential Revision: https://reviews.llvm.org/D125367
2022-05-16 11:25:20 +00:00
Simon Pilgrim f4eac6e5f6 [DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC.
Also, initialize entire mask to -1 to simplify undefined cases.
2022-05-14 20:49:26 +01:00
Simon Pilgrim 95cdd63b87 [DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
2022-05-14 18:39:41 +01:00
Simon Pilgrim 8db72d9d04 [DAG] visitMUL - pull out repeated SDLoc() calls. NFC. 2022-05-14 14:28:39 +01:00
Simon Pilgrim 8d4d4988e4 [DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
2022-05-14 14:19:12 +01:00
Simon Pilgrim 3fc33ced10 DAGCombiner.cpp - break if-else chains that always return (style) 2022-05-13 18:31:39 +01:00
Sanjay Patel e52e1dab2a [SDAG] freeze operand when expanging urem
This is a potential miscompile as discussed in issue #55291.

The related IR transform was patched with:
d428f09b2c
2022-05-13 10:55:14 -04:00
David Green 2cfb243bcd [DAG] Use isAnyConstantBuildVector. NFC
As suggested from 02f8519502, this uses the
isAnyConstantBuildVector method in lieu of separate
isBuildVectorOfConstantSDNodes calls. It should
otherwise be an NFC.
2022-05-09 14:13:03 +01:00
David Green 02f8519502 [DAG] Prevent infinite loop combining bitcast shuffle
This prevents an infinite loop from D123801, where code trying to reduce
the total number of bitcasts, but also handling constants, could create
the opposite transform. Prevent the transform in these case to let the
bitcast of a constant transform naturally.

Fixes #55345
2022-05-09 09:36:22 +01:00
Simon Pilgrim 800d36cf32 [DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use
Fixes #51381
2022-05-08 13:51:58 +01:00
Amaury Séchet 06fad8bc05 [DAGCombine] Add node in the worklist in topological order in CombineTo
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.

This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124743
2022-05-07 16:24:31 +00:00
Paul Walker 702c4ade22 [ISD::IndexType] Helper functions for common queries.
Add helper functions to query the signed and scaled properties
of ISD::IndexType along with functions to change them.

Remove setIndexType from MaskedGatherSDNode because it only has
one usage and typically should only be changed alongside its
index operand.

Minimise the direct use of the enum values to lay the groundwork
for more refactoring.

Differential Revision: https://reviews.llvm.org/D123347
2022-05-07 11:23:42 +01:00
David Green 5930691ee1 Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific"
This reverts commit 891c3cf99e as it turns
out that the error was not caused by this commit, the error caming
from D124526 instead.
2022-05-06 21:03:22 +01:00
David Green 891c3cf99e [DAGCombine] Make combineShuffleOfBitcast LittleEndian specific
Something is going wrong with the BigEndian PowerPC bot. It is hard to
tell what is wrong from here, but attempt to fix it by disabling the
combineShuffleOfBitcast combine for bigendian.
2022-05-06 18:42:44 +01:00
Simon Pilgrim c0bebc12f0 [DAG] visitREM - merge buildOptimizedSREM into if(). NFCI. 2022-05-06 15:39:17 +01:00
David Green 115c188807 [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.

Differential Revision: https://reviews.llvm.org/D123801
2022-05-06 10:50:31 +01:00
Craig Topper 4e2d1a6c18 [DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef.
Differential Revision: https://reviews.llvm.org/D124988
2022-05-05 09:34:18 -07:00
Craig Topper fd13192aa5 [DAGCombiner] Fold (max/min X, X) -> X.
Differential Revision: https://reviews.llvm.org/D124951
2022-05-05 09:34:17 -07:00
Nikita Popov 9678936f18 [DAGCombine] Fold (X & ~Y) | Y with truncated not
This extends the (X & ~Y) | Y to X | Y fold to also work if ~Y is
a truncated not (when taking into account the mask X). This is
done by exporting the infrastructure added in D124856 and reusing
it here.

I've retained the old value of AllowUndefs=false, though probably
this can be switched to true with extra test coverage.

Differential Revision: https://reviews.llvm.org/D124930
2022-05-05 11:10:11 +02:00
Simon Pilgrim faa35fc873 [DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts
Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding.

Also, the normalization should be performed with UREM not SREM.
2022-05-03 17:16:26 +01:00
Craig Topper 5f057eaa0d [DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add.
When looking for memory uses,
reassociationCanBreakAddressingModePattern should check uses of
the outer ADD rather than the inner ADD. We want to know if the
two ops we're reassociating are used by a load/store.

In practice, the existing check usually works because CodeGenPrepare
will make one of the load/stores have an offset of 0 relative to
split GEP. That will make the inner add have a memory use.

To test this, I've manually split the GEPs so there is no 0 offset
store.

This issue was recently discussed in the original review D60294.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124644
2022-05-02 16:38:53 -07:00
Sanjay Patel 747c6a0c73 [SDAG] fix miscompile when casting int->FP->int
This is the codegen equivalent of D124692.

As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.
https://alive2.llvm.org/ce/z/KtaDmd

Differential Revision: https://reviews.llvm.org/D124771
2022-05-02 14:57:27 -04:00
Simon Pilgrim ae8b10e543 [DAG] (style) Break apart if-else chain as they all return 2022-05-01 17:56:59 +01:00
Craig Topper 6affe87bda [DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)

We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.

I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.

Fixes PR55201.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124711
2022-04-30 11:02:30 -07:00
Paul Walker 23c509754d [DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
2022-04-29 14:20:13 +01:00
Paul Walker 7a0b897e86 [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
2022-04-29 12:35:16 +01:00
Simon Pilgrim 34e7243464 [DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x))
This is a very specific fold to fix an upstream poor codegen issue.

InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet.

Fixes #54911

Differential Revision: https://reviews.llvm.org/D124185
2022-04-22 16:39:25 +01:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
chenglin.bi 222adf338a [Arch64][SelectionDAG] Add target-specific implementation of srem
1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
2. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-19 02:49:42 +08:00
chenglin.bi acfc025a72 Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem"
This reverts commit 9d9eddd3dd.
2022-04-18 10:35:09 +08:00
chenglin.bi 9d9eddd3dd [Arch64][SelectionDAG] Add target-specific implementation of srem
X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-16 12:29:11 +08:00
Craig Topper c6dc229a6d [DAGCombiner] Move call to hasOneUse after opcode checks. NFC
Checking the opcode is cheap, counting the number of uses is not.
2022-04-15 17:02:16 -07:00
Craig Topper a7b9d75e7a [DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check.
hasOneUse is not cheap on nodes with chain results that might have
many uses. By checking the opcode first, we can avoid a costly walk
of the use list on nodes we aren't interested in.

Found by investigating calls to hasNUsesOfValue from the example
provided in D123857.
2022-04-15 16:38:27 -07:00
Simon Pilgrim fef221bf1f [DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics 2022-04-13 12:53:23 +01:00
Simon Pilgrim cfb3ee2185 [DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3))
Another part of D77804 yak shaving

Differential Revision: https://reviews.llvm.org/D123523
2022-04-13 11:37:33 +01:00
Simon Pilgrim bc32a1dd76 [DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds 2022-04-12 12:57:56 +01:00
Craig Topper 35be4a7af3 [SelectionDAG] Remove unecessary null check after call to getNode. NFC
As far as I know getNode will never return a null SDValue.

I'm guessing this was modeled after the FoldConstantArithmetic
call earlier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123550
2022-04-11 18:03:44 -07:00
Craig Topper 5b5f59428c [DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC
The extend case should never occur. The sign extend would be an
arbitrary choice, remove it to avoid confusion.
2022-04-06 09:59:45 -07:00
Paul Walker 7d3af9ef0f [DAGCombine] insert_subvector undef, (splat X), N2 -> splat X
Differential Revision: https://reviews.llvm.org/D120328
2022-04-06 17:15:38 +01:00
zhongyunde 19e5235147 [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD
Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD
because it relies on BuildVectorSDNode is fails for scalable vectors.
This patch is intend to handle that, so we can circle back the type MVT::nxv2i32

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122703
2022-04-06 20:50:42 +08:00
Simon Pilgrim 3369e474bb [DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds
As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases.

define i8 @src(i8 %x) {
  %r = xor i8 %x, 128
  ret i8 %r
}
=>
define i8 @tgt(i8 %x) {
  %r = add i8 %x, 128
  ret i8 %r
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/qV46E2

Differential Revision: https://reviews.llvm.org/D122754
2022-04-06 10:37:11 +01:00
Sanjay Patel e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Craig Topper e68257fcee [RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h

This is an alternative to D122454.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122458
2022-03-28 12:46:36 -07:00
Simon Pilgrim e209190c2d [SDAG] enable binop identity constant folds for multiplies
Add mul to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122071
2022-03-25 11:07:04 +00:00
zhongyunde 828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
Simon Pilgrim 35a7be6ccb [SDAG] enable binop identity constant folds for shifts
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122070
2022-03-21 13:02:50 +00:00
Luo, Yuanke 10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Craig Topper ad94dfb9a0 [DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference
RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.

This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.

Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121650
2022-03-15 08:26:47 -07:00
Sanjay Patel c2592c374e [SDAG] simplify bitwise logic with repeated operand
We do not have general reassociation here (and probably
do not need it), but I noticed these were missing in
patches/tests motivated by D111530, so we can at
least handle the simplest patterns.

The VE test diff looks correct, but we miss that
pattern in IR currently:
https://alive2.llvm.org/ce/z/u66_PM
2022-03-13 11:12:30 -04:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Sanjay Patel 341623653d [SDAG] match rotate pattern with extra 'or' operation
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn

Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.

There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.

This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.

We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.

Differential Revision: https://reviews.llvm.org/D120933
2022-03-09 13:19:00 -05:00
David Green 4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Sanjay Patel f4b53972ce [SDAG] fold bitwise logic with shifted operands
This extends acb96ffd14 to 'and' and 'xor' opcodes.

Copying from that message:

LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().
2022-03-05 11:14:45 -05:00
Paul Walker 42b4a6227e [DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation.
When triggered during operation legalisation the affected combine
generates a splat_vector that when custom lowered for SVE fixed
length code generation, results in the original precombine sequence
and thus we enter a legalisation/combine hang.

NOTE: The patch contains no tests because I observed this issue
only when combined with other work that might never become public.
The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific
test was not possible so I'm hoping the DAGCombiner fix can be seen
as obvious. The AArch64ISelLowering change is requirted to maintain
existing code quality.

Differential Revision: https://reviews.llvm.org/D120735
2022-03-04 11:54:03 +00:00
Craig Topper bf8054644d [DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user.
If the types aren't legal, the expansions may get type legalized in a
different way preventing code sharing. If the type is legal, we will
share some instructions between the two expansions, but we will need an
extra register.

Since we don't appear to fold (neg (sub A, B)) if the sub has an
additional user, I think it makes sense not to expand NABS.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120513
2022-03-01 07:32:07 -08:00
Sanjay Patel acb96ffd14 [SDAG] fold bitwise logic with shifted operands
LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().

This is a partial implementation of a transform suggested in D111530
(only handles 'or' bitwise logic as a first step - need to stamp out more
tests for other opcodes).
Several of the same tests added for D111530 are altered here (but not
fully optimized). I'm not sure yet if this would help/hinder that patch,
but this should be an improvement for all tests added with ecf606cb43
since it removes a shift operation in those examples.

Differential Revision: https://reviews.llvm.org/D120516
2022-02-27 09:54:12 -05:00
Simon Pilgrim fadd20f80d [DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold
As reported on D120192
2022-02-27 11:25:22 +00:00
Simon Pilgrim 370ebc9d9a [DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2))))
If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend.

Based off PR51391 + PR53867

Differential Revision: https://reviews.llvm.org/D120192
2022-02-24 19:33:51 +00:00
Sanjay Patel 4a3708cd6b [SDAG] remove shift that is redundant with part of funnel shift
This is the SDAG translation of D120253 :
https://alive2.llvm.org/ce/z/qHpmNn

The SDAG nodes can have different operand types than the result value.
We can see an example of that with AArch64 - the funnel shift amount
is an i64 rather than i32.

We may need to make that match even more flexible to handle
post-legalization nodes, but I have not stepped into that yet.

Differential Revision: https://reviews.llvm.org/D120264
2022-02-24 11:25:46 -05:00
Craig Topper c7d6448d03 [DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable.
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.

This patch passes them by value which is consistent with the vast
majority of code.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120420
2022-02-23 12:40:45 -08:00
Pawe Bylica afdaa86b77
[DAGCombine] Extend combineCarryDiamond()
In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118362
2022-02-23 21:37:49 +01:00
Paweł Bylica df0c16ce00
[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond
Pre-commit for https://reviews.llvm.org/D118362.
2022-02-21 21:41:31 +01:00
Simon Pilgrim 46f1e8359e [DAG] visitBSWAP - pull out repeated SDLoc. NFC
Cleanup for D120192
2022-02-21 13:08:01 +00:00
Chen Zheng efe5b8ad90 [ISEL] remove unnecessary getNode(); NFC
Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D120049
2022-02-20 21:08:49 -05:00
Luo, Yuanke 67ef63138b [SDAG] enable binop identity constant folds for sub
This patch extract the sub folding from D119654 and leave only add
folding in that patch.

Differential Revision: https://reviews.llvm.org/D120116
2022-02-21 09:37:36 +08:00
Sanjay Patel a2963d871e [SDAG] fold sub-of-shift to add-of-shift
This fold is done in IR:
https://alive2.llvm.org/ce/z/jWyFrP

There is an x86 test that shows an improvement
from the added flexibility of using add (commutative).

The other diffs are presumed neutral.

Note that this could also be folded to an 'xor',
but I'm not sure if that would be universally better
(eg, x86 can convert adds more easily into LEA).

This helps prevent regressions from a potential fold for
issue #53829.
2022-02-18 11:55:50 -05:00
Paul Walker 6457f42bde [DAGCombiner] Extend ISD::ABDS/U combine to handle more cases.
The current ABD combine doesn't quite work for SVE because only a
single scalable vector per scalar integer type is legal (e.g. for
i32, <vscale x 4 x i32> is the only legal scalable vector type).

This patch extends the combine to also trigger for the cases when
operand extension must be retained.

Differential Revision: https://reviews.llvm.org/D115739
2022-02-17 13:32:20 +00:00
David Green 655d0d86f9 [DAGCombine] Move AVG combine to SimplifyDemandBits
This moves the matching of AVGFloor and AVGCeil into a place where
demand bit are available, so that it can detect more cases for more
folds. It changes the transform to start from a shift, not from a
truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming
to ext(hadd(A, B)).

For signed values, because only the bottom bits are demanded llvm will
transform the above to use a lshr too, as opposed to ashr. In order to
correctly detect the hadd we need to know the demanded bits to turn it
back. Depending on whether the shift is signed (ashr) or logical (lshr),
and the extensions are signed or unsigned we can create different nodes.
If the shift is signed:
  Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd.
  Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd.
If the shift is unsigned:
  Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd.
  Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and
    https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd.

Differential Revision: https://reviews.llvm.org/D119072
2022-02-15 10:17:02 +00:00
David Green 03380c70ed [DAGCombine] Basic combines for AVG nodes.
This adds very basic combines for AVG nodes, mostly for constant folding
and handling degenerate (zero) cases. The code performs mostly the same
transforms as visitMULHS, adjusted for AVG nodes.

Constant folding extends to a higher bitwidth and drops the lowest bit.
For undef nodes, `avg undef, x` is transformed to x.  There is also a
transform for `avgfloor x, 0` transforming to `shr x, 1`.

Differential Revision: https://reviews.llvm.org/D119559
2022-02-14 11:18:35 +00:00
Craig Topper e72fe654b7 [DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants.
This enables fshl to be matched earlier on X86

  %6 = lshr i32 %3, 1
  %7 = select i1 %4, i32 -2147483648, i32 0
  %8 = or i32 %6, %7

X86 uses i8 for shift amounts. SelectionDAGBuilder creates the
ISD::SRL with an i8 shift type. DAGCombiner turns the select into
an ISD::SHL. Prior to this patch it would use i32 for the shift
amount. fshl matching failed because the shift amounts have different
types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This
allowed fshl matching to succeed.

With this patch, the ISD::SHL will be created with an i8 shift
amount. This allows the fshl to match immediately.

No test case beause we still end up with a fshl either way.
2022-02-13 19:09:26 -08:00
Sanjay Patel 96b7e0b5a0 [SDAG] clean up scalarizing load transform
I have not found a way to expose a difference for this patch in a test
because it only triggers for a one-use load, but this is the code that
was adapted into D118376 and caused miscompiles. The new code pattern
is the same as what we do in narrowExtractedVectorLoad() (reduces load
width for a subvector extract).

This removes seemingly unnecessary manual worklist management and fixes
the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()".

Differential Revision: https://reviews.llvm.org/D119549
2022-02-12 11:41:19 -05:00
Sanjay Patel 429f10f5f2 [SDAG] reduce code duplication and fix formatting; NFC 2022-02-12 10:22:13 -05:00
David Green 4072e362c0 [ISel] Port AArch64 HADD and RHADD to ISel
This ports the aarch64 combines for HADD and RHADD over to DAG combine,
so that they can be used in more architectures (notably MVE in a
followup patch). They are renamed to AVGFLOOR and AVGCEIL in the
process, to avoid confusion with instructions such as X86 hadd. The code
was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is
  %xe = sext i8 %x to i32
  %ye = sext i8 %y to i32
  %a = add i32 %xe, %ye
  %r = lshr i32 %a, 1
  %t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate
lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes
an extra rounding, so includes an extra add of 1.

Differential Revision: https://reviews.llvm.org/D106237
2022-02-11 18:28:56 +00:00
Reid Kleckner b5a592a8e2 [DAG] Remove pointless std::function wrapper, NFC 2022-02-09 14:30:43 -08:00
Reid Kleckner f63c150187 Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time."
Appears to be causing check-llvm to fail

This reverts commit 49ab760090.
2022-02-09 13:55:40 -08:00
Alina Sbirlea 49ab760090 [DagCombine] Increase depth by number of operands to avoid a pathological compile time.
We're hitting a pathological compile-time case, profiled to be in
DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet.
It looks like one of the paths around findBetterNeighborChains is not
capped and leads to this.

This patch resolves the issue. Looking for feedback if this solution
looks reasonable.

Differential Revision: https://reviews.llvm.org/D118877
2022-02-09 13:31:28 -08:00
Sander de Smalen ec46232517 [DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)`
This seems like an obvious fold, which leads to a few improvements.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118920
2022-02-09 14:30:01 +00:00
Sanjay Patel 905abc5b7d [SDAG] enable binop identity constant folds for fmul/fdiv
The test diffs are identical to D119111.

This only affects x86 currently because no other target
has an override for the TLI hook that controls this transform.
2022-02-08 10:52:28 -05:00
Sanjay Patel a68e098024 [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC
This is no-functional-change-intended because only the
x86 target enables the TLI hook currently.

We can add fmul/fdiv opcodes to the switch similar to the
proposal D119111, but we don't need to make other changes
like enabling target-specific combines.

We can also add integer opcodes (add, or, shl, etc.) to
the switch because this function is called from all of the
generic binary opcodes.

The goal is to incrementally enable the profitable diffs
from D90113 while avoiding regressions.

Differential Revision: https://reviews.llvm.org/D119150
2022-02-08 09:55:05 -05:00
Simon Pilgrim fd2bb51f1e [ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length
In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.

This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.

I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.

Differential Revision: https://reviews.llvm.org/D119019
2022-02-08 12:04:13 +00:00
Simon Pilgrim 74555fd367 [DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC. 2022-02-07 09:58:47 +00:00