https://alive2.llvm.org/ce/z/mJP7XP
This can be viewed as expanding the compare into and/or-of-compares:
https://alive2.llvm.org/ce/z/bkZYWE
followed by reduction of each compare.
This could be extended in several ways:
1. There's a (X & Y) == -1 sibling.
2. We can recurse through more than 1 'or'.
3. The fold could be generalized beyond rotates - any operation that
only changes the order of bits (bswap, bitreverse).
This is a transform noted in D111530.
This is the SDAG equivalent of an instcombine transform added with:
fd807601a7
This is another step towards solving #49541 and part of an alternative
set of more general transforms than what is proposed in D111530.
https://alive2.llvm.org/ce/z/ToxaE8
This is a fix for a regression discussed in:
https://github.com/llvm/llvm-project/issues/53829
We cleared more high multiplier bits with 995d400,
but that can lead to worse codegen because we would fail
to recognize the now disguised multiplication by neg-power-of-2
as a shift-left. The problem exists independently of the IR
change in the case that the multiply already had cleared high
bits. We also convert shl+sub into mul+add in instcombine's
negator.
This patch fills in the high-bits to see the shift transform
opportunity. Alive2 attempt to show correctness:
https://alive2.llvm.org/ce/z/GgSKVX
The AArch64, RISCV, and MIPS diffs look like clear wins. The
x86 code requires an extra move register in the minimal examples,
but it's still an improvement to get rid of the multiply on all
CPUs that I am aware of (because multiply is never as fast as a
shift).
There's a potential follow-up noted by the TODO comment. We
should already convert that pattern into shl+add in IR, so
it's probably not common:
https://alive2.llvm.org/ce/z/7QY_GaFixes#53829
Differential Revision: https://reviews.llvm.org/D120216
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).
By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.
Some X86 tests for Z - abs(X) seem to have improved as well.
Other targets look to be a wash.
I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.
This is an alternative to D119099 which was focused on RISCV only.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D119171
This moves the matching of AVGFloor and AVGCeil into a place where
demand bit are available, so that it can detect more cases for more
folds. It changes the transform to start from a shift, not from a
truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming
to ext(hadd(A, B)).
For signed values, because only the bottom bits are demanded llvm will
transform the above to use a lshr too, as opposed to ashr. In order to
correctly detect the hadd we need to know the demanded bits to turn it
back. Depending on whether the shift is signed (ashr) or logical (lshr),
and the extensions are signed or unsigned we can create different nodes.
If the shift is signed:
Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd.
Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd.
If the shift is unsigned:
Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd.
Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and
https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd.
Differential Revision: https://reviews.llvm.org/D119072
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.
Differential Revision: https://reviews.llvm.org/D117210
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.
This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc
Differential Revision: https://reviews.llvm.org/D117983
Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc
Differential Revision: https://reviews.llvm.org/D117983
This was noted as a potential cleanup in D117508.
getShiftAmountTy() has checks for vector, phase, etc. so it should
handle anything that the caller was trying to account for.
A possible codegen regression for PowerPC is noted in D117406
because we don't recognize a pattern that demands only 1 byte
from a bswap.
This fold has existed in IR since close to the beginning of LLVM:
https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794
...so this patch copies that code as much as possible and adapts
it for SDAG.
The test for PowerPC that would change in D117406 is over-reduced
with undefs, so I recreated it for AArch64 and x86 by passing in
pointer args and renamed the values to make the logic clearer.
Differential Revision: https://reviews.llvm.org/D117508
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
Completely rework how we handle X constrained labels for inline asm.
X should really be treated as i. Then existing tests can be moved to use
i D115410 and clang can just emit i D115311. (D115410 and D115311 are
callbr, but this can be done for label inputs, too).
Coincidentally, this simplification solves an ICE uncovered by D87279
based on assumptions made during D69868.
This is the third approach considered. See also discussions v1 (D114895)
and v2 (D115409).
Reported-by: kernel test robot <lkp@intel.com>
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115688
This is the last part of D116531. Fetch the type of the indirect
inline asm operand from the elementtype attribute, rather than
the pointer element type.
Fixes https://github.com/llvm/llvm-project/issues/52928.
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.
Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.
Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.
Reviewed By: lebedev.ri, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D116522
getShiftAmountTy used to directly return the shift amount type from
the target which could be too small for large illegal types. For
example, X86 always returns i8.
The code here detected this and used i32 instead if it won't fit. This
behavior was added to getShiftAmountTy in D112469 so we no longer need
this workaround.
Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift
This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.
This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.
The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.
Differential Revision: https://reviews.llvm.org/D114449
This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.
Differential Revision: https://reviews.llvm.org/D114676
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
Reapplied with fix for AArch64 rev patterns to matching the ARM fix.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354