Commit Graph

12252 Commits

Author SHA1 Message Date
Craig Topper 8eaf00e04d [TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do

x = x | (x >> 1);
x = x | (x >> 2);
...
x = x | (x >>16);
x = x | (x >>32); // for 64-bit input
return popcount(~x);

This smears the most significant set bit across all of the bits
below it then inverts the remaining 0s and does a population count.

To support non-power of 2 types, the last shift amount must be
more than half of the size of the type. For i15, the last shift
was previously a shift by 4, with this patch we add another shift
of 8.

Fixes PR56457.

Differential Revision: https://reviews.llvm.org/D129431
2022-07-12 11:36:37 -07:00
Simon Pilgrim ded62411f7 [DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
2022-07-12 14:09:00 +01:00
Nikita Popov c64aba5d93 [SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI)
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code
in ParseConstraints(), in particular all the logic to determine the
operand value and constraint VT.

Rely on the data computed by ParseConstraints() instead, and update
its ConstraintVT implementation to match getCallOperandValEVT()
more precisely.
2022-07-12 10:42:02 +02:00
Craig Topper b05160dbdf [SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits.
As far as I can tell what was happening in the original code is
that the getNode call receives the same operands as the original
node with different SDNodeFlags. The logic inside getNode detects
that the node already exists and intersects the flags into the
existing node and returns it. This results in Op and NewOp for the
TLO.CombineTo call always being the same node.

We may have already called CombineTo as part of the recursive handling.
A second call to CombineTo as we unwind the recursion overwrites
the previous CombineTo. I think this means any time we updated the
poison flags that was the only change that ends up getting made
and we relied on DAGCombiner to revisit and call SimplifyDemandedBits
again. The second time the poison flags wouldn't need to be dropped
and we would keep the CombineTo call from further down the recursion.

We can instead call setFlags to drop the poison flags and remove the
call to TLO.CombineTo. This way we keep the CombineTo from deeper in
the recursion which should be more efficient.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129511
2022-07-11 13:42:33 -07:00
Sanjay Patel d0eec5f7e7 [SDAG] enhance sub->xor fold to ignore signbit
As suggested in the post-commit feedback for D128123,
we can ease the mask constraint to ignore the MSB
(and make the code easier to read by adjusting the check).

https://alive2.llvm.org/ce/z/bbvqWv
2022-07-11 12:37:50 -04:00
Kazu Hirata 1fd6611fc8 [SelectionDAG] Restore calls to has_value (NFC)
This patch restores calls to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.

This patch partially reverts d08f34b592.

Differential Revision: https://reviews.llvm.org/D129454
2022-07-10 14:37:23 -07:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Craig Topper 40866b74bd [DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C)))
We already handled this case for add with a constant RHS. A
similar pattern can occur for sub with a constant left hand side.

Test cases use add and a mul representing (neg (shl X, C)) because
that's what I saw in the wild. The mul will be decomposed and then
the new transform can kick in.

Tests have not been committed, but this patch shows the changes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128769
2022-07-09 11:53:44 -07:00
Simon Pilgrim b53046122f [DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I)
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all.

In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly.

Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector.

Differential Revision: https://reviews.llvm.org/D129290
2022-07-08 16:08:31 +01:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
OCHyams 6b62ca9043 [NFC][SelectionDAG] Fix debug prints in salvageUnresolvedDbgValue
The prints are printing pointer values - fix by dereferencing the pointers.
2022-07-08 12:09:30 +01:00
Sergei Barannikov 2247fdc84d [SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes
Some overflow-aware nodes were missing from the switches in
computeKnownBits and ComputeNumSignBits.
2022-07-08 09:19:19 +01:00
Bradley Smith 60d6be5dd3 [LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal
This is done during type legalization since the target representation of
these nodes may not be valid until after type legalization, and after
type legalization the fact that these are dealing with i1 types may be
lost.

Differential Revision: https://reviews.llvm.org/D128996
2022-07-07 09:33:54 +00:00
Sander de Smalen 15c3ba8a44 [AArc64] Legalisation of compares and truncates of nxv1i1 types.
Truncates and compares require some changes to generic legalisation functions
to use ElementCount instead of getNumElements.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129082
2022-07-07 07:39:27 +00:00
Edd Barrett ed8ef65f3d
[stackmaps] Start legalizing live variable operands
Prior to this change, live variable operands passed to
`llvm.experimental.stackmap` would be emitted directly to target nodes,
meaning that they don't get legalised. The upshot of this is that LLVM
may crash when encountering illegally typed target nodes.

e.g. https://github.com/llvm/llvm-project/issues/21657

This change introduces a platform independent stackmap DAG node whose
operands are legalised as per usual, thus avoiding aforementioned
crashes.

Note that some kinds of argument are still not handled properly, namely
vectors, structs, and large integers, like i128s. These will need to be
addressed in follow-up changes.

Note also that this does not change the behaviour of
`llvm.experimental.patchpoint`. A follow up change will do the same for
this intrinsic.

Differential review:
https://reviews.llvm.org/D125680
2022-07-06 14:01:54 +01:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Nikita Popov bb84e5eeff [SelectionDAGISel] Drop unused variable (NFC) 2022-07-06 10:46:13 +02:00
Nikita Popov 8ee913d83b [IR] Remove Constant::canTrap() (NFC)
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
2022-07-06 10:36:47 +02:00
Simon Pilgrim 7068c843d2 [DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat
We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.
2022-07-05 16:44:31 +01:00
Simon Pilgrim e7a0fa4df0 [DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds
Noticed by inspection - the new shift is only ever used if the constant fold occurs
2022-07-05 16:44:31 +01:00
Simon Pilgrim cce64e7a9c [DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits.
Another cleanup step before removing GetDemandedBits entirely.
2022-07-04 11:25:40 +01:00
Nikita Popov 7283f48a05 [IR] Remove support for insertvalue constant expression
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.

ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).

The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.

Differential Revision: https://reviews.llvm.org/D128719
2022-07-04 09:27:22 +02:00
Sander de Smalen 690db16422 [AArch64] Make nxv1i1 types a legal type for SVE.
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.

This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.

Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.

Reviewed By: paulwalker-arm, efriedma

Differential Revision: https://reviews.llvm.org/D128665
2022-07-01 15:11:13 +00:00
Xiang1 Zhang 72a23cef7e [ISel] Match all bits when merge undefs for DAG combine
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128570
2022-07-01 09:09:43 +08:00
Xiang1 Zhang 64f44a90ef Revert "[ISel] Match all bits when merge undef(s) for DAG combine"
This reverts commit 5fe5aa284e.
2022-07-01 08:59:04 +08:00
Xiang1 Zhang 5fe5aa284e [ISel] Match all bits when merge undef(s) for DAG combine 2022-07-01 08:58:00 +08:00
jeff 09424f802c [AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched
Differential Revision: https://reviews.llvm.org/D128681
2022-06-30 09:18:04 -07:00
Nikita Popov 16033ffdd9 [ConstExpr] Remove more leftovers of extractvalue expression (NFC)
Remove some leftover bits of extractvalue handling after the
removal in D125795.
2022-06-29 10:45:19 +02:00
Tim Northover 4aafebce52 SelectionDAG: allow FP extensions when folding extract/insert.
Before, we were trying to sign extend half -> float, and asserted in getNode.
2022-06-28 12:08:35 +01:00
Guillaume Chatelet 3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Kazu Hirata 94460f5136 Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 19:54:41 -07:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
chenglin.bi 8c74205642 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-24 23:15:06 +08:00
Nabeel Omer 0d41794335 [SLP] Add cost model for `llvm.powi.*` intrinsics (REAPPLIED)
Patch was reverted in 4c5f10a due to buildbot failures, now being
reapplied with updated AArch64 and RISCV tests.

This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-24 10:23:19 +00:00
Lian Wang 1ce30457c1 [LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code
Reviewed By: craig.topper, david-arm

Differential Revision: https://reviews.llvm.org/D128038
2022-06-24 03:06:16 +00:00
Lian Wang 770fe864fe [SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128239
2022-06-24 02:32:53 +00:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
chenglin.bi 9c2bf534f5 Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate"
This reverts commit 6c951c5ee6.
2022-06-23 13:21:51 +08:00
Guillaume Chatelet 57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet 8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Simon Pilgrim 2c3a4a9334 [DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits
Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
2022-06-22 13:48:57 +01:00
Simon Pilgrim 1c2b756cd6 [DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC. 2022-06-21 22:07:41 +01:00
Simon Pilgrim 8cecb6be56 [DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC.
We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.
2022-06-21 21:23:10 +01:00
Nabeel Omer 4c5f10aeeb Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics"
This reverts commit e6ccb57bb3.
2022-06-21 15:05:55 +00:00
Nabeel Omer e6ccb57bb3 [SLP] Add cost model for `llvm.powi.*` intrinsics
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.

Differential Revision: https://reviews.llvm.org/D128172
2022-06-21 14:40:34 +00:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
chenglin.bi 6c951c5ee6 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-21 09:45:19 +08:00
David Green c0ecbfa4fd [AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D128144
2022-06-20 19:11:57 +01:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Guillaume Chatelet 03036061c7 [Alignment] Use 'previous()' method instead of scalar division
This is in preparation of integration with D128052.

Differential Revision: https://reviews.llvm.org/D128169
2022-06-20 11:01:43 +00:00
Simon Pilgrim e4a124dda5 [DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m)
Similar to the existing (shl (srl x, c1), c2) fold

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125836
2022-06-20 08:37:38 +01:00
Lian Wang ab25e263a9 [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D127710
2022-06-20 06:30:26 +00:00
Craig Topper 314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Simon Pilgrim ba3f2667b6 [DAG] Add MaskedVectorIsZero helper
Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero
2022-06-19 17:56:30 +01:00
Simon Pilgrim 1ebe5cac46 [DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification 2022-06-19 15:35:29 +01:00
Simon Pilgrim db1be696c4 [DAG] SimplifyDemandedBits - add ISD::VSELECT handling 2022-06-19 15:18:25 +01:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Paul Walker 0e21f1d56a [SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases.
WidenVecOp_INSERT_SUBVECTOR only supported cases where widening
effectively converts the insert into a copy.  However, when the
widened subvector is no bigger than the vector being inserted into
and we can be sure there's no loss of data, we can simply emit
another INSERT_SUBVECTOR.

Fixes: #54982

Differential Revision: https://reviews.llvm.org/D127508
2022-06-17 12:39:42 +00:00
Lian Wang f2bcf33058 [LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127825
2022-06-17 02:43:52 +00:00
Lian Wang 16215eb979 [LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D127939
2022-06-17 02:26:09 +00:00
Craig Topper e6c7a3a54f [SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources.
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.

IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.

The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128005
2022-06-16 14:55:14 -07:00
Adrian Tong 55311801f0 Allow bitwidth difference when checking for isOneOrOneSplat.
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands

t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
   t4: v8i16 = and t2, t3
      t5: v8i16 = add t8, t4

This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.

Differential Revision: https://reviews.llvm.org/D127354
2022-06-16 16:04:20 +00:00
Benjamin Kramer 8c4a07c61f [DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op 2022-06-15 19:54:39 +02:00
Benjamin Kramer ca50cb120b [SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP. 2022-06-15 18:51:32 +02:00
Paul Robinson 654a835c3f [PS5] Trap after noreturn calls, with special case for stack-check-fail 2022-06-15 09:02:17 -07:00
Benjamin Kramer fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00
Simon Pilgrim f096d5926d [DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold
Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code.

Differential Revision: https://reviews.llvm.org/D127772
2022-06-15 11:07:59 +01:00
Ping Deng c06f77ec0d [SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D127474
2022-06-15 05:50:18 +00:00
Guillaume Chatelet 5a293d21fc [NFC][Alignment] Use getAlign in SelectionDAGBuilder 2022-06-13 15:13:05 +00:00
Nikita Popov b9a7dea917 [SelectionDAG] Handle trapping aggregate (PR49839)
Call canTrap() on Constant to account for trapping
ConstantAggregate.
2022-06-13 15:06:53 +02:00
Simon Pilgrim 7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Simon Pilgrim 1cf9b24da3 [DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This helps with several of the regressions from D125836
2022-06-12 19:25:20 +01:00
Simon Pilgrim 54ae4ca755 [DAG] visitSRL - pull out ShiftVT. NFC. 2022-06-12 14:02:23 +01:00
Simon Pilgrim cf5c63d187 [DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats
Addresses a number of regressions identified in D127115
2022-06-11 21:06:42 +01:00
Simon Pilgrim 44a0cd25df [DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling
Check if we're just replacing one v1x?? vector with another
2022-06-11 19:30:00 +01:00
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Simon Pilgrim 693f4db1ec [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI.
Remove the early-out cases so we can more easily add additional folds in the future.
2022-06-11 12:01:13 +01:00
Paul Walker 10d55c4634 [SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST.
Differential Revision: https://reviews.llvm.org/D127322
2022-06-11 10:41:13 +01:00
Guillaume Chatelet 95083fa3b8 [NFC] Remove deadcode 2022-06-10 15:13:42 +00:00
Simon Pilgrim 91adbc3208 [DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS
Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes

Another minor improvement for D127115
2022-06-10 16:06:43 +01:00
Guillaume Chatelet 38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00
Simon Pilgrim 7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Guillaume Chatelet dc3367970e [SelectionDAG] Handle bzero/memset libcalls globally instead of per target
Differential Revision: https://reviews.llvm.org/D127279
2022-06-09 08:34:55 +00:00
Craig Topper 4bcfc41846 [SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value.
This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127335
2022-06-08 14:55:58 -07:00
Kai Nacke d897a14c2e [SystemZ] Fix check for zero size when lowering memcmp.
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.

The fix is to turn the value into a SDValue, so that both functions
use the same check.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D126900
2022-06-08 14:52:13 -04:00
Simon Pilgrim b84c10d4bc [DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT
Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again
2022-06-08 16:16:35 +01:00
Paul Walker d88354213c [SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.

To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.

There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.

Depends on D126957.
Fixes: #55114

Differential Revision: https://reviews.llvm.org/D127126
2022-06-08 10:30:07 +01:00
Paul Walker a1121c31d8 [SVE] Fix incorrect code generation for bitcasts of unpacked vector types.
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
               01234567
e.g. nxv2i32 = XX??XX??
     nxv4f16 = X?X?X?X?

Differential Revision: https://reviews.llvm.org/D126957
2022-06-08 10:30:07 +01:00
Simon Pilgrim a083f3caa1 [DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements
As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first.

This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).
2022-06-07 16:42:24 +01:00
Guillaume Chatelet 0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
Nikita Popov 5a64bc207e [DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846)
These assert that there are no "useless" assertzext/assertsext nodes
(that assert a wider width than a following trunc), but I don't think
there is anything preventing such nodes from reaching this code.
I don't think the assertion is relevant for correctness of this
transform either -- if such an assert is present, then the other
one will always be to a smaller width, and we'll pick that one.
The assertion dates back to D37017.

Fixes https://github.com/llvm/llvm-project/issues/55846.

Differential Revision: https://reviews.llvm.org/D126952
2022-06-07 09:50:26 +02:00
Craig Topper be398100ea [SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative.
Move the code that was added for D126896 after the normal recursive calls
to computeKnownBits. This allows us to calculate trailing zeros.
Previously we would break out of the switch before the recursive calls.
2022-06-06 09:59:23 -07:00