Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996.
If the shuffle reduces vector length, the transform
reduces the width of the cast, so that should be a
win for most codegen (if not, it can be inverted).
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This extends the transform added with 0353c2c996.
If the casts are to a larger element type, the transform
reduces shuffle bit width, so that should be a win for
most codegen (if not, it can be inverted).
shuffle (cast X), (cast Y), Mask --> cast (shuffle X, Y, Mask)
This is similar to a recent transform with fneg ( b331a7ebc1 ),
but this is intentionally the most conservative first step to
try to avoid regressions in codegen. There are several
restrictions that could be removed as follow-up enhancements.
Note that a cast with a unary shuffle is currently canonicalized
in the other direction (shuffle after cast - D103038 ). We might
want to invert that to be consistent with this patch.
For the unary shuffle pattern, this is opposite to what we try
to do with binops, but it seems better to keep it consistent
with the motivating binary shuffle pattern. On that, it is
clearly better on the usual no-extra uses case.
There is a chance that this will pull an fneg away from some
other binop and cause a regression in codegen, but that should
be invertible in the backend. The transform is birectional:
https://alive2.llvm.org/ce/z/kKaKCUhttps://alive2.llvm.org/ce/z/3DesfwFixes#45631
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.
We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.
Fixes#54364
Differential Revision: https://reviews.llvm.org/D122390
The transform can require an optional shuffle instruction to be sound,
so we need to use Builder to create all values and then replace the
original instruction with whatever that final value is.
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
This reorders existing transforms to put demanded elements last. The reasoning here is that when we have an example which can be scalarized or handled via demanded bits, we should prefer scalarization as that doesn't require dropping flags on arithmetic instructions.
This doesn't show major changes in the tests today, but once I add support for fast math flags to dropPoisonGeneratingFlags this becomes glaringly obvious.
Differential Revision: https://reviews.llvm.org/D115394
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.
This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.
Differential Revision: https://reviews.llvm.org/D112318
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )
Differential Revision: https://reviews.llvm.org/D111891
We already handle more complicated cases like:
extelt (bitcast (inselt poison, X, 0)) --> trunc (lshr X)
But we missed this simpler pattern:
https://alive2.llvm.org/ce/z/D55h64 / https://alive2.llvm.org/ce/z/GKzzRq
This is part of solving:
https://llvm.org/PR52057
I made the transform depend on legal/desirable int type to avoid creating
a shift of an illegal type (for example i128). I'm not sure if that
restriction is actually necessary, but we can change that as a follow-up
if the backend can deal with integer ops on too-wide illegal types.
The pile of AVX512 test changes are all neutral AFAICT - the x86 backend
seems to know how to turn that into the expected "kmov" instructions.
Differential Revision: https://reviews.llvm.org/D111082
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110230
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110181
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:
inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)
https://alive2.llvm.org/ce/z/z2aBu9
Note that there are several possible extensions of this fold
(see TODO comments).
Differential Revision: https://reviews.llvm.org/D109537
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
I have updated cheapToScalarize to also consider the case when
extracting lanes from a stepvector intrinsic. This required removing
the existing 'bool IsConstantExtractIndex' and passing in the actual
index as a Value instead. We do this because we need to know if the
index is <= known minimum number of elements returned by the stepvector
intrinsic. Effectively, when extracting lane X from a stepvector we
know the value returned is also X.
New tests added here:
Transforms/InstCombine/vscale_extractelement.ll
Differential Revision: https://reviews.llvm.org/D106358
In all of these, the value must be an instruction for us to succeed anyway,
so change it to maybe hopefully make further changes more straight-forward.
This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.
Differential Revision: https://reviews.llvm.org/D103153
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.
Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .
This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
We sometimes see code like this:
Case 1:
%gep = getelementptr i32, i32* %a, <2 x i64> %splat
%ext = extractelement <2 x i32*> %gep, i32 0
or this:
Case 2:
%gep = getelementptr i32, <4 x i32*> %a, i64 1
%ext = extractelement <4 x i32*> %gep, i32 0
where there is only one use of the GEP. In such cases it makes
sense to fold the two together such that we create a scalar GEP:
Case 1:
%ext = extractelement <2 x i64> %splat, i32 0
%gep = getelementptr i32, i32* %a, i64 %ext
Case 2:
%ext = extractelement <2 x i32*> %a, i32 0
%gep = getelementptr i32, i32* %ext, i64 1
This may create further folding opportunities as a result, i.e.
the extract of a splat vector can be completely eliminated. Also,
even for the general case where the vector operand is not a splat
it seems beneficial to create a scalar GEP and extract the scalar
element from the operand. Therefore, in this patch I've assumed
that a scalar GEP is always preferrable to a vector GEP and have
added code to unconditionally fold the extract + GEP.
I haven't added folds for the case when we have both a vector of
pointers and a vector of indices, since this would require
generating an additional extractelement operation.
Tests have been added here:
Transforms/InstCombine/gep-vector-indices.ll
Differential Revision: https://reviews.llvm.org/D101900
Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.
This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.
This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.
Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.
Differential Revision: https://reviews.llvm.org/D97397
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.
This was initially reported to me by David Major
as a clang crash during gecko build for android.
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.
In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.
We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.
On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).
Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.