Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.
Differential Revision: https://reviews.llvm.org/D106189
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.
This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".
Differential Revision: https://reviews.llvm.org/D103169
If any operand of a math op is poison, that takes
precedence over general undef/NaN.
This should not be visible with binary ops because
it requires 2 constant operands to trigger (and if
both operands of a binop are constant, that should
get handled first in ConstantFolding).
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.
We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817
Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC
The same logic applies to scalable vectors.
Differential Revision: https://reviews.llvm.org/D104867
This is the cause of the miscompile in:
https://llvm.org/PR50944
The problem has likely existed for some time, but it was made visible with:
5af8bacc94 ( D104661 )
handleOtherCmpSelSimplifications() assumed it can convert select of
constants to bool logic ops, but that does not work with poison.
We had a very similar construct in InstCombine, so the fix here
mimics the fix there.
The bug is in instsimplify, but I'm not sure how to reproduce it outside of
instcombine. The reason this is visible in instcombine is because we have a
hack (FIXME) to bypass simplification of a select when it has an icmp user:
955f125899/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (L2632)
So we get to an unusual case where we are trying to simplify an instruction
that has an operand that would have already simplified if we had processed
it in normal order.
Differential Revision: https://reviews.llvm.org/D105298
This adds more poison folding optimizations to InstSimplify.
Since all binary operators propagate poison, these are fine.
Also, the precondition of `select cond, undef, x` -> `x` is relaxed to allow the case when `x` is undef.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104661
We already have this fold:
fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.
The fold for undef was added before poison existed as a
value/type in IR.
This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).
Differential Revision: https://reviews.llvm.org/D104383
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Relanding with a check for self-referential values.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
This patch allows that scalable vector can also use the fold that already
exists for fixed vector, only when the lane index is lower than the minimum
number of elements of the vector.
Differential Revision: https://reviews.llvm.org/D102404
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd
We can find this pattern after demanded elements transforms.
As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
The previous rule:
(insert_vector _, (extract_vector X, 0), 0) -> X
is not quite correct. The correct fold should be:
(insert_vector Y, (extract_vector X, 0), 0) -> X
where: Y is X, or Y is undef
This commit updates the pattern.
Reviewed By: peterwaller-arm, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D102699
This commit removes some redundant {insert,extract}_vector intrinsic
chains by implementing the following patterns as instsimplifies:
(insert_vector _, (extract_vector X, 0), 0) -> X
(extract_vector (insert_vector _, X, 0), 0) -> X
Reviewed By: peterwaller-arm
Differential Revision: https://reviews.llvm.org/D101986
This reverts commit ea1a0d7c9a.
While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.
Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.
So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.
The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.
Further improvements in this direction seem possible.
This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.
Differential Revision: https://reviews.llvm.org/D100408
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
The current code does not properly handle vector indices unless they are
the first index.
At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).
But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.
This patch updates SimplifyGEPInst to properly handle those additional
cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99961
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.
This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.
The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.
This allows us to do away with the following pattern in many of the SVE tests:
RUN: .... 2>%t
RUN: cat %t | FileCheck --check-prefix=WARN
WARN-NOT: warning: ...
The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.
This patch also has fixes the following tests:
unittests:
ScalableVectorMVTsTest.SizeQueries
SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG
regression tests:
Transforms/InstCombine/vscale_gep.ll
Reviewed By: paulwalker-arm, ctetreau
Differential Revision: https://reviews.llvm.org/D98856
This is an alternative to D98391/D98585, playing things more
conservatively. If AllowRefinement == false, then we don't use
InstSimplify methods at all, and instead explicitly implement a
small number of non-refining folds. Most cases are handled by
constant folding, and I only had to add three folds to cover
our unit tests / test-suite. While this may lose some optimization
power, I think it is safer to approach from this direction, given
how many issues this code has already caused.
Differential Revision: https://reviews.llvm.org/D99027
This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not
preserving the provenance.
https://alive2.llvm.org/ce/z/qbQoAY
Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b}
Depends on D98672
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98611
In preparation for D98611, the upcoming change will need to apply additional checks to `P` and `V`,
and so this refactor paves the way for adding additional checks in a less awkward way.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98672
The motivating pattern was handled in 0a2d69480d ,
but we should have this for symmetry.
But this really highlights that we could generalize for
any shifted constant if we match this in instcombine.
https://alive2.llvm.org/ce/z/MrmVNt
Add simplification of smul.fix and smul.fix.sat according to
X * 0 -> 0
X * undef -> 0
X * (1 << scale) -> X
This includes the commuted patterns and splatted vectors.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D98299
This is a patch that adds folding of two logical and/ors that share one variable:
a && (a && b) -> a && b
a && (a & b) -> a && b
...
This is towards removing the poison-unsafe select optimization (D93065 has more context).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96945
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.
Differential Revision: https://reviews.llvm.org/D97305
This patch adds a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
reversed. For example:
```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```
The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.
This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker (@paulwalker-arm).
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Differential Revision: https://reviews.llvm.org/D94883
We can fold x*C1/C2 <= x to true if C1 <= C2. This is valid even
if the multiplication is not nuw: https://alive2.llvm.org/ce/z/vULors
The multiplication or division can be replaced by shifts. We don't
handle the case where both are shifts, as that should get folded
away by InstCombine.
This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403.
Folding gep p, q-p to q is only legal if p and q have the same
provenance. This fold should probably be guarded by something like
getUnderlyingObject(p) == getUnderlyingObject(q).
This patch is a partial fix that removes the special handling for
gep p, 0-p, which will fold to a null pointer, which would certainly
not pass an underlying object check (unless p is also null, in which
case this would fold trivially anyway). Folding to a null pointer
is particularly problematic due to the special handling it receives
in many places, making end-to-end miscompiles more likely.
Differential Revision: https://reviews.llvm.org/D93820
I don't believe this has an observable effect, because the only
thing we care about here is replacing the operand with a constant
so following folds can apply. This change is just to make the
representation follow canonical unary shuffle form.
Calling null or undef results in immediate undefined behavior.
Return poison instead of undef in this case, similar to what
we do for immediate UB due to division by zero.
Make InstSimplify return poison rather than undef for out-of-bounds
shifts, as specified by LandRef:
> If op2 is (statically or dynamically) equal to or larger than the
> number of bits in op1, this instruction returns a poison value.
Differential Revision: https://reviews.llvm.org/D93998
As the comment already indicates, performing an operation with
nnan/ninf flags on a nan/inf or undef results in poison. Now that
we have a proper poison value, we no longer need to relax it to
undef.
Div/rem by zero is immediate undefined behavior and anything goes.
Currently we fold it to undef, this patch changes it to fold to
poison instead, which is slightly stronger.
Differential Revision: https://reviews.llvm.org/D93995
This is the same change as D93990, but for extractelement rather
than insertelement.
> If idx exceeds the length of val for a fixed-length vector, the
> result is a poison value. For a scalable vector, if the value of
> idx exceeds the runtime length of the vector, the result is a
> poison value.
This is a simple patch that updates InstSimplify to return poison if the index is/can be out-of-bounds
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93990
The last use of the function, located in RemovePredecessorAndSimplify,
was removed on Dec 25, 2020 in commit
46bea9b297.
The last use of RemovePredecessorAndSimplify was removed on Sep 29,
2010 in commit 99c985c37d.
The transform wasn't checking that the LHS of the comparison
*is* the `X` in question...
This is the miscompile that was holding up D87188.
Thanks to Dave Green for producing an actionable reproducer!
Folding a select of vector constants that include undef elements only
applies to fixed vectors, but there's no earlier check the type is not
scalable so it crashes for scalable vectors. This adds a check so this
optimization is only attempted for fixed vectors.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D92046
This extends D78430 to solve cases like:
https://llvm.org/PR47858
There are still missed opportunities shown in the tests,
and as noted in the earlier patches, we have related
functionality in InstCombine, so we may want to extend
other folds in a similar way.
A semi-random sampling of test diff proofs in this patch:
https://rise4fun.com/Alive/sS4C
This improves simplifications for pattern `icmp (X+Y), (X+Z)` -> `icmp Y,Z`
if only one of the operands has NSW set, e.g.:
icmp slt (x + 0), (x +nsw 1)
We can still safely rewrite this to:
icmp slt 0, 1
because we know that the LHS can't overflow if the RHS has NSW set and
C1 < C2 && C1 >= 0, or C2 < C1 && C1 <= 0
This simplification is useful because ScalarEvolutionExpander which is used to
generate code for SCEVs in different loop optimisers is not always able to put
back NSW flags across control-flow, thus inhibiting CFG simplifications.
Differential Revision: https://reviews.llvm.org/D89317
If SimplifyWithOpReplaced() cannot simplify the value, null should
be returned. Make sure this really does happen in all cases,
including those where SimplifyBinOp() returns the original value.
This does not matter for existing users, but does mattter for
D87480, which would go into an infinite loop otherwise.
If the constant operand is the opposite of the min/max value,
then the result must be the other value.
This is based on the similar codegen transform proposed in:
D87571
As discussed in the sibling codegen functionality patch D87571,
this transform was created with D52766, but it is not correct.
The incorrect test diffs were missed during review, but the
'TODO' comment about this functionality was still in the code -
we need 'nnan' to enable this fold.
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.
The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:
* Export the SimplifyWithOpReplaced API from InstSimplify, with an
added AllowRefinement flag. For replacements inside the TrueVal
we don't actually care whether refinement occurs or not, the
replacement is always legal. This part of the transform is now
done in InstSimplify only. (It should be noted that the current
AllowRefinement check is not sufficient -- that's an issue we
need to address separately.)
* Change the InstCombine fold to work by temporarily dropping
poison generating flags, running the fold and then restoring the
flags if it didn't work out. This will ensure that the InstCombine
fold is correct as long as the InstSimplify fold is correct.
Differential Revision: https://reviews.llvm.org/D87445
If we know that the abs operand is known negative, we can replace
it with a neg.
To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.
Differential Revision: https://reviews.llvm.org/D87196
This addresses the remaining issue from D87188. Due to a series of
folds, we may end up with abs-of-abs represented as
x == 0 ? -abs(x) : abs(x). Rather than recognizing this as a special
abs pattern and doing an abs-of-abs fold on it afterwards,
I'm directly folding this to one of the select operands in InstSimplify.
The general pattern falls into the "select with operand replaced"
category, but that fold is not powerful enough to recognize that
both hands of the select are the same for value zero.
Differential Revision: https://reviews.llvm.org/D87197
If we have a dominating condition that x >= y, then umax(x, y) is x,
etc. I'm doing this in InstSimplify as the corresponding transform
for the select form is also done there.
Differential Revision: https://reviews.llvm.org/D87168
Replace the check for poison-producing instructions in
SimplifyWithOpReplaced() with the generic helper canCreatePoison()
that properly handles poisonous shifts and thus avoids the problem
from PR47322.
This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which
previously could have caused this code to ignore poison flags.
Setting UseInstrInfo=false should reduce the possible optimizations,
not increase them.
This is not a full solution to the problem, as poison could be
introduced more indirectly. This is just a minimal, easy to backport
fix.
Differential Revision: https://reviews.llvm.org/D86834
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
As pointed out in post-commit review, this can legally be called
on instructions that are not inserted into basic blocks,
so don't blindly assume that there is basic block.
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
So i would think this is pretty uncontroversial.
On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% |
| assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% |
| early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% |
| instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% |
| instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% |
| instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.
That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..
I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.
Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.
As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
This is a reboot of D84655, now performing the inner icmp
simplification query without undef folds.
It should be possible to handle the current foldMinMaxSharedOp()
fold based on this, by moving the logic into icmp of min/max instead,
making it more general. We can't drop the folds for constant operands,
because those also allow undef, which we exclude here.
The tests use assumes for exhaustive coverage, and have a few
more examples of misc folds we get based on icmp simplification.
Differential Revision: https://reviews.llvm.org/D85929
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.
Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.
This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85946