This extends the code in SearchForAndLoads to be able to look through
ANY_EXTEND nodes, which can be created from mismatching IR types where
the AND node we begin from only demands the low parts of the register.
That turns zext and sext into any_extends as only the low bits are
demanded. To be able to look through ANY_EXTEND nodes we need to handle
mismatching types in a few places, potentially truncating the mask to
the size of the final load.
Recommitted with a more conservative check for the type of the extend.
Differential Revision: https://reviews.llvm.org/D117457
This patch fixes a case where the 'align' parameter attribute on the
pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped
during the conversion to the SelectionDAG. The default alignment equal
to the ABI type alignment of the vector type was kept. It also updates
the documentation to reflect the fact that the parameter attribute is
now properly supported.
The default alignment of these intrinsics was previously documented as
being equal to the ABI alignment of the *scalar* type, when in fact that
wasn't the case: the ABI alignment of the vector type was used instead.
This has also been fixed in this patch.
Reviewed By: simoll, craig.topper
Differential Revision: https://reviews.llvm.org/D114423
This was noted as a potential cleanup in D117508.
getShiftAmountTy() has checks for vector, phase, etc. so it should
handle anything that the caller was trying to account for.
When widening these intrinsics, we do not have to insert neutral
elements at the end of the vector as when widening vector.reduce.*
intrinsics, thanks to vector predication semantics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117467
This caused builds to fail with
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5638:
bool (anonymous namespace)::DAGCombiner::BackwardsPropagateMask(llvm::SDNode *):
Assertion `NewLoad && "Shouldn't be masking the load if it can't be narrowed"' failed.
See the code review for a link to a reproducer.
> This extends the code in SearchForAndLoads to be able to look through
> ANY_EXTEND nodes, which can be created from mismatching IR types where
> the AND node we begin from only demands the low parts of the register.
> That turns zext and sext into any_extends as only the low bits are
> demanded. To be able to look through ANY_EXTEND nodes we need to handle
> mismatching types in a few places, potentially truncating the mask to
> the size of the final load.
>
> Differential Revision: https://reviews.llvm.org/D117457
This reverts commit 578008789f.
Split vp.reduction.* intrinsics by splitting the vector to reduce in
two halves, perform the reduction operation in each one of them and
accumulate the results of both operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117469
A possible codegen regression for PowerPC is noted in D117406
because we don't recognize a pattern that demands only 1 byte
from a bswap.
This fold has existed in IR since close to the beginning of LLVM:
https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794
...so this patch copies that code as much as possible and adapts
it for SDAG.
The test for PowerPC that would change in D117406 is over-reduced
with undefs, so I recreated it for AArch64 and x86 by passing in
pointer args and renamed the values to make the logic clearer.
Differential Revision: https://reviews.llvm.org/D117508
This extends the code in SearchForAndLoads to be able to look through
ANY_EXTEND nodes, which can be created from mismatching IR types where
the AND node we begin from only demands the low parts of the register.
That turns zext and sext into any_extends as only the low bits are
demanded. To be able to look through ANY_EXTEND nodes we need to handle
mismatching types in a few places, potentially truncating the mask to
the size of the final load.
Differential Revision: https://reviews.llvm.org/D117457
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
Update code comments in DAGCombiner::ReduceLoadWidth and refactor
the handling of SRL a bit. The refactoring is done with the intent
of adding support for folding away SRA by using SEXTLOAD in a
follow-up patch.
The function is also renamed as DAGCombiner::reduceLoadWidth.
Differential Revision: https://reviews.llvm.org/D117104
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
Original patch by @hussainjk.
This patch was split off from D109377 to keep vector legalization
(widening/splitting) separate from vector element legalization
(promoting).
While the original patch added a third overload of
SelectionDAG::getVPStore, this patch takes the liberty of collapsing
those all down to 1, as three overloads seems excessive for a
little-used node.
The original patch also used ModifyToType in places, but that method
still crashes on scalable vector types. Seeing as the other VP
legalization methods only work when all operands need identical
widening, this patch follows in that vein.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117235
This seems to be a leftover from a long time ago when there was
an ISD::VBIT_CONVERT and a MVT::Vector. It looks like in those days
the vector type was carried in a VTSDNode.
As far as I know, these days ComputeValueTypes would have already
assigned "Result" the same type we're getting from TLI.getValueType
here. Thus the BITCAST is always a NOP. Verified by adding an assert
and running check-llvm.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D117335
This commit sometimes causes a crash when compiling a vtable thunk. E.g.:
clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF
struct a {
virtual int f();
};
struct c {
virtual int &g() const;
};
struct d : a, c {
int &g() const;
};
int &d::g() const {}
EOF
Some follow-up commits have been reverted as well:
Revert "IR: Make getRetAlign check callee function attributes"
Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC."
Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC."
This reverts commit 4f414af6a7.
This reverts commit a5507d2e25.
This reverts commit 3d2d208f6a.
This reverts commit 07ddfa95e3.
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling).
FEQ matches well to STRICT_FSETCC oeq.
FLT/FLE matches well to STRICT_FSETCCS olt/ole.
Others require commuting operands or multiple instructions.
STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE,
but we need to save/restore FFLAGS around them to avoid spurious
exceptions. I've implemented pseudo instructions with a
CustomInserter to insert the save/restore CSR instructions.
Unfortunately, this doesn't honor exceptions for signaling NANs
but I'm not sure if signaling nans are really supported by the
constrained intrinsics.
STRICT_FSETCC one and ueq expand to a pair of FLT instructions
with a save/restore of fflags around each. This could be improved
in the future.
There may be some opportunities to generate better code for strict
comparisons mixed with nonans fast math flags. I've left FIXMEs in
the .td files for that.
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D116694
Completely rework how we handle X constrained labels for inline asm.
X should really be treated as i. Then existing tests can be moved to use
i D115410 and clang can just emit i D115311. (D115410 and D115311 are
callbr, but this can be done for label inputs, too).
Coincidentally, this simplification solves an ICE uncovered by D87279
based on assumptions made during D69868.
This is the third approach considered. See also discussions v1 (D114895)
and v2 (D115409).
Reported-by: kernel test robot <lkp@intel.com>
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115688
I've changed the definition of the experimental.vector.splice
instrinsic to reject indices that are known to be or possibly
out-of-bounds. In practice, this means changing the definition so that
the index is now only valid in the range [-VL, VL-1] where VL is the
known minimum vector length. We use the vscale_range attribute to
take the minimum vscale value into account so that we can permit
more indices when the attribute is present.
The splice intrinsic is currently only ever generated by the vectoriser,
which will never attempt to splice vectors with out-of-bounds values.
Changing the definition also makes things simpler for codegen since we
can always assume that the index is valid.
This patch was created in response to review comments on D115863
Differential Revision: https://reviews.llvm.org/D115933
This commit fixes a missed opportunity in merging consecutive stores.
The code that searches for stores skipped the case of stores that
directly connect to the root. The comment above the implementation lists
this case but the code did not handle it. I found this pattern when
looking into the shared_ptr destructor. GCC generates the right
sequence. Here is a small repo:
int foo(int* buff) {
buff[0] = 0;
int x = buff[1];
buff[1] = 0;
return x;
}
Differential Revision: https://reviews.llvm.org/D116895
These nodes should saturate to their saturating VT. We can use this
information to know the bits past the VT are all zeros or all sign bits.
I think we might only have test coverage for the unsigned case. I'll
verify and add tests.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D116870
This is the last part of D116531. Fetch the type of the indirect
inline asm operand from the elementtype attribute, rather than
the pointer element type.
Fixes https://github.com/llvm/llvm-project/issues/52928.
Split vp.select in a similar way as vselect, splitting also the length
parameter.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116651
We can either check the opcode or number of operands or use
ISD::isVPOpcode inside the methods.
In some places I've used number of operands figuring that it is
cheaper than isVPOpcode. I've included isVPOpcode in an assert to
verify.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D116578
Unsigned compares work with either zero extended or sign extended
inputs just like equality comparisons. I didn't allow this when
I refactored the code in D116421 due to lack of tests. But I've
since found a simple C test case that demonstrates when this can be
useful.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D116617
Use the VPIntrinsics.def's LEGALPOS that is specified with every VP
SDNode to determine which return or operand value type shall be used to
infer the legalization action.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D116594
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.
Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.
Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.
Reviewed By: lebedev.ri, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D116522
This is similar to what is done for targets that prefer zero extend
where we avoid using a zero extend if the promoted values are sign
extended.
We'll also check for zero extended operands for ugt, ult, uge, and ule when the
target prefers sign extend. This is different than preferring zero extend, where
we only check for sign bits on equality comparisons.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D116421
The 'New' only makes sense in the context of these being
output arguments, but they are also used as inputs first.
Drop the 'New' and just call them LHS/RHS.
Factored out of D116421.
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.
It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.
I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D116323
getShiftAmountTy used to directly return the shift amount type from
the target which could be too small for large illegal types. For
example, X86 always returns i8.
The code here detected this and used i32 instead if it won't fit. This
behavior was added to getShiftAmountTy in D112469 so we no longer need
this workaround.
Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift
This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.
When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the combine
for adjacent stores so we get this behavior at -O0.
Similar to D7181.
Reviewed By: spatel, xgupta
Differential Revision: https://reviews.llvm.org/D115808