This was noted as a potential cleanup in D117508.
getShiftAmountTy() has checks for vector, phase, etc. so it should
handle anything that the caller was trying to account for.
A possible codegen regression for PowerPC is noted in D117406
because we don't recognize a pattern that demands only 1 byte
from a bswap.
This fold has existed in IR since close to the beginning of LLVM:
https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794
...so this patch copies that code as much as possible and adapts
it for SDAG.
The test for PowerPC that would change in D117406 is over-reduced
with undefs, so I recreated it for AArch64 and x86 by passing in
pointer args and renamed the values to make the logic clearer.
Differential Revision: https://reviews.llvm.org/D117508
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
Completely rework how we handle X constrained labels for inline asm.
X should really be treated as i. Then existing tests can be moved to use
i D115410 and clang can just emit i D115311. (D115410 and D115311 are
callbr, but this can be done for label inputs, too).
Coincidentally, this simplification solves an ICE uncovered by D87279
based on assumptions made during D69868.
This is the third approach considered. See also discussions v1 (D114895)
and v2 (D115409).
Reported-by: kernel test robot <lkp@intel.com>
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115688
This is the last part of D116531. Fetch the type of the indirect
inline asm operand from the elementtype attribute, rather than
the pointer element type.
Fixes https://github.com/llvm/llvm-project/issues/52928.
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.
Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.
Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.
Reviewed By: lebedev.ri, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D116522
getShiftAmountTy used to directly return the shift amount type from
the target which could be too small for large illegal types. For
example, X86 always returns i8.
The code here detected this and used i32 instead if it won't fit. This
behavior was added to getShiftAmountTy in D112469 so we no longer need
this workaround.
Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift
This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.
This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.
The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.
Differential Revision: https://reviews.llvm.org/D114449
This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.
Differential Revision: https://reviews.llvm.org/D114676
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
Reapplied with fix for AArch64 rev patterns to matching the ARM fix.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
Fixed the vector type issue that where we used getVectorNumElements()
should be replaced by getVectorElementCount() when lowering these
intrinsics.
This is similar to D94149
Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D109809
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.
The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.
This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.
We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.
Differential Revision: https://reviews.llvm.org/D111904
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.
I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.
Differential Revision: https://reviews.llvm.org/D113396
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112331
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.
I have another patch to add more callers of these so I thought
I'd clean up the interface first.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112267
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.
This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.
This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.
For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.
Differential Revision: https://reviews.llvm.org/D111919
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.
For some insert:
nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)
The offset was not scaled by vscale:
orr x8, x8, #0x4
st1h { z0.h }, p0, [sp]
st1h { z1.d }, p1, [x8]
ld1h { z0.h }, p0/z, [sp]
And is changed to:
mov x8, sp
st1h { z0.h }, p0, [sp]
st1h { z1.d }, p1, [x8, #1, mul vl]
ld1h { z0.h }, p0/z, [sp]
Differential Revision: https://reviews.llvm.org/D111633
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
While these functions are only used in one location in upstream,
it has been reused in multiple downstreams. Restore this file to
a globally visibile location (outside of APInt.h) to eliminate
donwstream breakage and enable potential future reuse.
Additionally, this patch renames types and cleans up
clang-tidy issues.
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.
When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.
This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.
This came up on D109065 where we currently have to add yet another implementation of the same code.
Differential Revision: https://reviews.llvm.org/D109683
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.
This also upgrades some C style comments to C++.
Differential Revision: https://reviews.llvm.org/D109476
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Differential Revision: https://reviews.llvm.org/D109214
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.
After this, D109065 finally avoids the regression in D108382.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D109074
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.
This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.
This patch fixes PR51677.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D108938