Another cleanup for D90479 - handle the Known Ones/Zeros in a single callback, which will make it much easier to jump over to the KnownBits shift handling.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
Avoid an expensive isKnownNonZero() call - this is a small cleanup before moving the extra NSW functionality from computeKnownBitsMul into KnownBits::computeForMul.
This reverts the revert commit a1b53db324.
This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.
For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
This reverts commit 1922570489.
This appears to cause a crash in the following example
a, b, c;
l() {
int e = a, f = l, g, h, i, j;
float *d = c, *k = b;
for (;;)
for (; g < f; g++) {
k[h] = d[i];
k[h - 1] = d[j];
h += e << 1;
i += e;
}
}
clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c
llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.
One example is using min/max instructions for certain patterns.
This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.
This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.
This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).
Metric: SLP.NumVectorInstructions
Program base slp diff
test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8%
test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2%
test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1%
test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2%
test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5%
test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2%
Reviewed By: RKSimon, samparker
Differential Revision: https://reviews.llvm.org/D89969
This patch is to add the support of the value tracking of the alignment assume bundle.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88669
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of cttz to process any "icmp pred cttz(X), C" pattern (the
min value is initialized to zero automatically).
https://alive2.llvm.org/ce/z/Z_SLWZ
Follow-up to D89976.
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctlz to process any "icmp pred ctlz(X), C" pattern (the
min value is initialized to zero automatically).
Follow-up to D89976.
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctpop to process any "icmp pred ctpop(X), C" pattern (the
min value is initialized to zero automatically).
Differential Revision: https://reviews.llvm.org/D89976
Prior to this patch, computeKnownBits would only try to deduce trailing zeros
bits for getelementptrs. This patch adds the logic to treat geps as a series
of add * scaling factor.
Thanks to this patch, using a gep or performing an address computation
directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr)
offers the same computeKnownBits information.
Previously, the "by hand" approach would have given more information.
This is related to https://llvm.org/PR47241.
Differential Revision: https://reviews.llvm.org/D86364
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.
This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.
The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89050
TypeSize comparisons using overloaded operators should be replaced by
the new isKnownXY comparators when the operands can be fixed-length or
scalable vectors.
In ValueTracking there are several uses of the overloaded operators in
`isKnownNonZero` and `ComputeMultiple`. In the former we already bail
out on scalable vectors since we currently have no way to represent
DemandedElts, and the latter is operating on scalar integers, so we can
assume fixed-size in both instances.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D89387
This patch refactors the logic in ValueTracking.cpp so that
computeKnownBitsForMul now uses a helper function from KnownBits.
NFC
Differential Revision: https://reviews.llvm.org/D88935
Handle the case when all inputs of phi are proven to be non zero.
Constants are checked in beginning of this method before check for depth of recursion,
so it is a partial case of non-constant phi.
Recursion depth is already handled by the function.
Reviewers: aqjune, nikic, efriedma
Reviewed By: nikic
Subscribers: dantrushin, hiraditya, jdoerfert, llvm-commits
Differential Revision: https://reviews.llvm.org/D88276
It was mentioned that D88276 that when a phi node is visited, terminators at their incoming edges should be used for CtxI.
This is a patch that makes two functions (ComputeNumSignBitsImpl, isGuaranteedNotToBeUndefOrPoison) to do so.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D88360
As discussed in D87877, instcombine already has this fold,
but it was missing from the more general ValueTracking logic.
https://alive2.llvm.org/ce/z/PumYZP
This is a patch that allows isGuaranteedNotToBeUndefOrPoison to return more precise result
when an argument is given, by looking through its uses at the entry block (and following blocks as well, if it is checking poison only).
This is useful when there is a function call with noundef arguments at the entry block.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D88207
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.
```
br (x != 0), BB1, BB2
BB1:
y = freeze x
```
In the above program, we can say that y is non-zero. The reason is as follows:
(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D75808
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.
isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84242
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D86692
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.
This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86576
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.
Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86477
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.
e.g.
```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```
When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.
To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.
Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.
https://reviews.llvm.org/D85375
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.
Differential Revision: https://reviews.llvm.org/D85000
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.
Differential Revision: https://reviews.llvm.org/D84971
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.
In followup patches I'd like to replace the function with
GetUnderlyingObjects.
Depends on D84616.
Differential Revision: https://reviews.llvm.org/D84617
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.
Differential Revision: https://reviews.llvm.org/D84963
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84596
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83976
Make sure we do not call
constainsConstantExpression/containsUndefElement on ConstantExpression,
which is not supported.
In particular, containsUndefElement/constainsConstantExpression are only
supported on constants which are supported by getAggregateElement.
Unfortunately there's no convenient way to check if a constant supports
getAggregateElement, so just check for non-constantexpressions with
vector type. Other users of those functions do so too.
Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84512
.. in isGuaranteedNotToBeUndefOrPoison.
This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return
imprecise result.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84251
This patch
- adds `canCreateUndefOrPoison`
- refactors `canCreatePoison` so it can deal with constantexprs
`canCreateUndefOrPoison` will be used at D83926.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D84007
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
The IR doesn't have a proper concept of invalid pointers, and "null"
constants are just all zeros (though it really needs one).
I think it's not possible to break this for AMDGPU due to the copy
semantics of byval. If you have an original stack object at 0, the
byval copy will be placed above it so I don't think it's really
possible to hit a 0 address.
GetUnderlyingObject() (and by required symmetry
DecomposeGEPExpression()) will call SimplifyInstruction() on the
passed value if other checks fail. This simplification is very
expensive, but has little effect in practice. This patch removes
the SimplifyInstruction call(), and replaces it with a check for
single-argument phis (which can occur in canonical IR in LCSSA
form), which is the only useful simplification case I was able to
identify.
At O3 the geomean CTMark improvement is -1.7%. The largest
improvement is SPASS with ThinLTO at -6%.
In test-suite, I see only two tests with a hash difference and
no code size difference (PAQ8p, Ptrdist), which indicates that
the simplification only ends up being useful very rarely. (I would
have liked to figure out which simplification is responsible here,
but wasn't able to spot it looking at transformation logs.)
The AMDGPU test case that is update was using two selects with
undef condition, in which case GetUnderlyingObject will return
the first select operand as the underlying object. This will of
course not happen with non-undef conditions, so this was not
testing anything realistic. Additionally this illustrates potential
unsoundness: While GetUnderlyingObject will pick the first operand,
the select might be later replaced by the second operand, resulting
in inconsistent assumptions about the undef value.
Differential Revision: https://reviews.llvm.org/D82261
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
This patch updates computeConstantRange to optionally take an assumption
cache as argument and use the available assumptions to limit the range
of the result.
Currently this is limited to assumptions that are comparisons.
Reviewers: reames, nikic, spatel, jdoerfert, lebedev.ri
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D76193
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.
Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer. This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated. I
updated clang's handling of C++ references, and added a release note for
this.
Differential Revision: https://reviews.llvm.org/D80072
computeKnownBitsFromAssume() currently asserts if m_V matches a
ptrtoint that changes the bitwidth. Because InstCombine
canonicalizes ptrtoint instructions to use explicit zext/trunc,
we never ran into the issue in practice. I'm adding unit tests,
as I don't know if this can be triggered via IR anywhere.
Fix this by calling anyextOrTrunc(BitWidth) on the computed
KnownBits. Note that we are going from the KnownBits of the
ptrtoint result to the KnownBits of the ptrtoint operand,
so we need to truncate if the ptrtoint zexted and anyext if
the ptrtoint truncated.
Differential Revision: https://reviews.llvm.org/D79234
Summary:
This fixes PR45885 by fixing isGuaranteedNotToBeUndefOrPoison so it does not look into dominating
branch conditions of V when V is an instruction in an unreachable block.
Reviewers: spatel, nikic, lebedev.ri
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79790
Summary:
This patch makes propagatesPoison be more accurate by returning true on
more bin ops/unary ops/casts/etc.
The changed test in ScalarEvolution/nsw.ll was introduced by
a19edc4d15 .
IIUC, the goal of the tests is to show that iv.inc's SCEV expression still has
no-overflow flags even if the loop isn't in the wanted form.
It becomes more accurate with this patch, so think this is okay.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, sanjoy
Reviewed By: spatel, nikic
Subscribers: regehr, nlopes, efriedma, fhahn, javed.absar, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78615
Summary:
Any function in this module that make use of DemandedElts laregely does
not work with scalable vectors. DemandedElts is used to define which
elements of the vector to look at. At best, for scalable vectors, we can
express the first N elements of the vector. However, in practice, most
code that uses these functions expect to be able to talk about the
entire vector. In principle, this module should be able to be extended
to work with scalable vectors. However, before we can do that, we should
ensure that it does not cause code with scalable vectors to miscompile.
All functions that use a DemandedElts will bail out if the vector is
scalable. Usages of getNumElements() are updated to go through
FixedVectorType pointers.
Reviewers: rengolin, efriedma, sdesmalen, c-rhodes, spatel
Reviewed By: efriedma
Subscribers: david-arm, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79053
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.
This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778
Differential Revision: https://reviews.llvm.org/D79422
Summary:
This patch helps isGuaranteedNotToBeUndefOrPoison look into more constants and instructions (bitcast/alloca/gep/fcmp).
To deal with bitcast, Depth is added to isGuaranteedNotToBeUndefOrPoison.
This patch is splitted from https://reviews.llvm.org/D75808.
Checked with Alive2
Reviewers: reames, jdoerfert
Reviewed By: jdoerfert
Subscribers: sanwou01, spatel, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76010
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sunfish, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77273
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.
This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74060
Use the simpler BitWidth constructor instead of the copy constructor to
make it clear when we don't actually need to copy an existing KnownBits
value. Split out from D74539. NFC.
The cmyk test is based on the known regression that resulted from:
rGf2fbdf76d8d0
This improves on the equivalent signed min/max change:
rG867f0c3c4d8c
The underlying icmp equivalence is:
~X pred ~Y --> Y pred X
For an icmp with constant, canonicalization results in a swapped pred:
~X < C --> X > ~C
The cmyk tests are based on the known regression that resulted from:
rGf2fbdf76d8d0
So this improvement in analysis might be enough to restore that commit.
D51664 added Instruction::comesBefore which should provide better
performance than the manual check.
Reviewers: rnk, nikic, spatel
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D76228
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
Optimize the common case of splat vector constant. For large vector
going through all elements is expensive. For splatr/broadcast cases we
can skip going through all elements.
Differential Revision: https://reviews.llvm.org/D76664
Summary:
Avoid blind cast from Operator to ExtractElementInst in
computeKnownBitsFromOperator. This resulted in some crashes
in downstream fuzzy testing. Instead we use getOperand directly
on the Operator when accessing the vector/index operands.
Haven't seen any problems with InsertElement and ShuffleVector,
but I believe those could be used in constant expressions as well.
So the same kind of fix as for ExtractElement was also applied for
InsertElement.
When it comes to ShuffleVector we now simply bail out if a dynamic
cast of the Operator to ShuffleVectorInst fails. I've got no
reproducer indicating problems for ShuffleVector, and a fix would be
slightly more complicated as getShuffleDemandedElts is involved.
Reviewers: RKSimon, nikic, spatel, efriedma
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76564
If one operand is unknown (and we don't have nowrap), don't compute
the second operand.
Also don't create an unnecessary extra KnownBits variable, it's
okay to reuse KnownOut.
This reduces instructions on libclamav_md5.c by 40%.
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.
For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().
For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76454
Summary:
Return None if GEP index type is scalable vector. Size of scalable vectors
are multiplied by a runtime constant.
Avoid transforming:
%a = bitcast i8* %p to <vscale x 16 x i8>*
%tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0
%tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1
store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1
into:
%a = bitcast i8* %p to <vscale x 16 x i8>*
%tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
%1 = bitcast <vscale x 16 x i8>* %tmp0 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false)
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76464
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where the
scalable property doesn't matter, we should explicitly call getKnownMinSize()
to avoid implicit type conversion to uint64_t, which is not valid for scalable
vector type.
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76260
Because we have to use a ConstantExpr at some point, the canonical form
isn't set in stone, but this seems reasonable.
The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient
LLVM; I didn't have to touch that code. :)
Differential Revision: https://reviews.llvm.org/D75887
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
This is a resubmission of 952ad47 with crash fix of llvm/test/Transforms/LoopRotate/freeze-crash.ll.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
Summary:
This patch helps getGuaranteedNonFullPoisonOp handle llvm.assume call.
Also, a comment about the semantics of branch is removed to prevent confusion.
As llvm.assume does, branching on poison directly raises UB (as LangRef says), and this allows transformations such as introduction of llvm.assume on branch condition at each successor, or freely replacing values after conditional branch (such as at loop exit).
Handling br is not addressed in this patch. It makes SCEV more accurate, causing existing LoopVectorize/IndVar/etc tests to fail.
Reviewers: spatel, lebedev.ri, nlopes
Reviewed By: nlopes
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75397
Summary:
This patch helps isGuaranteedNotToBeUndefOrPoison return true if the value
makes the program always undefined.
According to value tracking functions' comments, it is not still in consensus
whether a poison value can be bitwise or not, so conservatively only the case with
i1 is considered.
Reviewers: spatel, lebedev.ri, reames, nlopes, regehr
Reviewed By: nlopes
Subscribers: uenoku, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75396
isKnownNonNaN() could not recognize a zero splat because that is a
ConstantAggregateZero which is-a ConstantData but not a ConstantDataVector.
Patch makes a ConstantAggregateZero return true.
Review: Thomas Lively
Differential Revision: https://reviews.llvm.org/D74263
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
If we know that a >= b (unsigned), usub.with.overflow(a, b) cannot
overflow. Similarly, if b > a, the same expression overflows.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, Gerolf
Differential Revision: https://reviews.llvm.org/D74066
This patch adds versions of isImpliedCondition and
isImpliedByDomCondition that take a predicate, LHS and RHS operands as
instead of a Value representing the condition.
This allows using those functions to check conditions without having a
concrete ICmp instruction.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D74065
This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents.
So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage.
Differential Revision: https://reviews.llvm.org/D73435
This addresses https://bugs.llvm.org/show_bug.cgi?id=42801.
The m_c_ICmp() matcher is changed to provide the swapped predicate
if the operands are swapped.
Existing uses of m_c_ICmp() fall in one of two categories: Working
on equality predicates only, where swapping is irrelevant.
Or performing a manual swap, in which case this patch removes it.
The only exception is the foldICmpWithLowBitMaskedVal() fold, which
does not swap the predicate, and instead reasons about whether
a swap occurred or not for each predicate. Getting the swapped
predicate allows us to merge the logic for pairs of predicates,
instead of duplicating it.
Differential Revision: https://reviews.llvm.org/D72976
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).
But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.
I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.
While there, we also deal with other predicates,
mainly if the comparison is with constant.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].
Differential Revision: https://reviews.llvm.org/D71660
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.
We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.
As noted in D71660 review by @nikic.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.
Differential Revision: https://reviews.llvm.org/D71177
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.
The reproducers created from the bugs that caused the old commits to be
reverted are included.
Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans
Subscribers: hiraditya, bollu, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71181
As it can be seen from accompanying cleanup, it is not unheard of
to write `~Known.Zero` meaning "what maximal value can this KnownBits
produce". But i think `~Known.Zero` isn't *that* self-explanatory,
as compared to a method with a name.
Note that not all `~Known.Zero` places were cleaned up,
only those where this arguably improves things.
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:
$ cat /tmp/a.c
unsigned char f(unsigned char *p) {
unsigned char result = 0;
for (int shift = 0; shift < 1; ++shift)
result |= p[0] << (shift * 8);
return result;
}
$ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
f: # @f
.cfi_startproc
# %bb.0: # %entry
xorl %eax, %eax
retq
That's nicely optimized, but I don't think it's the right result :-)
> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571
This reverts commit 57dd4b03e4.
Same as D60846 but with a fix for the problem encountered there which
was a missing context adjustment in the handling of PHI nodes.
The test that caused D60846 to be reverted was added in e15ab8f277.
Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69571
This patch improves the handling of pointer offset in GEP expressions where
one argument is the base pointer. isPointerOffset() is being used by memcpyopt
where current code synthesizes consecutive 32 bytes stores to one store and
two memset intrinsic calls. With this patch, we convert the stores to one
memset intrinsic.
Differential Revision: https://reviews.llvm.org/D67989
llvm-svn: 374454
The static analyzer is warning about a potential null dereferences, but since the pointer is only used in a switch statement for Operator::getOpcode() (with an empty default) then its easiest just to wrap this in a null test as the dyn_cast might return null here.
llvm-svn: 372962
Static analyzer complains about const_cast uninitialized variables, we should explicitly set these to null.
Ideally that const wrapper would go away though.......
llvm-svn: 372603
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.
llvm-svn: 371556
Summary:
Simplify the API using Optional<> and address comments in
https://reviews.llvm.org/D66165
Reviewers: vitalybuka
Subscribers: hiraditya, llvm-commits, ostannard, pcc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66317
llvm-svn: 369300
Summary:
Currently we fail to compute known bits for recurrences where the
first incoming value is the start value of the recurrence.
Instead of exiting the loop when the first incoming value is not
the step of the recurrence, continue to check the second incoming
value.
The original code uses a loop to handle both cases, but incorrectly
exits instead of continuing.
Reviewers: lebedev.ri, spatel, nikic
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66216
llvm-svn: 369088
Some uses of getArgumentAliasingToReturnedPointer and
isIntrinsicReturningPointerAliasingArgumentWithoutCapturing require the
calls/intrinsics to preserve the nullness of the argument.
For alias analysis, the nullness property does not really come into
play.
This patch explicitly sets it to true. In D61669, the alias analysis
uses will be switched to not require preserving nullness.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert
Reviewed By: jdoerfert
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64150
llvm-svn: 368993
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.
Original patch by arielb1.
Differential Revision: https://reviews.llvm.org/D37215
llvm-svn: 368723
The matchSelectPattern code can match patterns like (x >= 0) ? x : -x
for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x).
If the latter form was matched we can only use the nsw flag if its
set on both subtracts.
This match makes sure we're looking at the former case only.
Differential Revision: https://reviews.llvm.org/D65692
llvm-svn: 368195
Summary:
In D37215, AssumeLikeInstruction are regarded as `willreturn`. In this patch, annotation is added to those which don't have `willreturn` now(`sideeffect, object_size, experimental_widenable_condition`).
Reviewers: jdoerfert, nikic, sstefan1
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65455
llvm-svn: 367342
Summary: As clarified in D53184, volatile load and store do not trap. Therefore, we should remove volatile checks for instructions in `isGuaranteedToTransferExecutionToSuccessor`.
Reviewers: jdoerfert, efriedma, nikic
Reviewed By: nikic
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65375
llvm-svn: 367226
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
Summary: Vector of the same value with few undefs will sill be considered "Bytewise"
Reviewers: eugenis, pcc, jfb
Reviewed By: jfb
Subscribers: dexonsmith, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64031
llvm-svn: 365971
Summary:
This helps with more efficient use of memset for pattern initialization
From @pcc prototype for -ftrivial-auto-var-init=pattern optimizations
Binary size change on CTMark, (with -fuse-ld=lld -Wl,--icf=all, similar results with default linker options)
```
master patch diff
Os 8.238864e+05 8.238864e+05 0.0
O3 1.054797e+06 1.054797e+06 0.0
Os zero 8.292384e+05 8.292384e+05 0.0
O3 zero 1.062626e+06 1.062626e+06 0.0
Os pattern 8.579712e+05 8.338048e+05 -0.030299
O3 pattern 1.090502e+06 1.067574e+06 -0.020481
```
Zero vs Pattern on master
```
zero pattern diff
Os 8.292384e+05 8.579712e+05 0.036578
O3 1.062626e+06 1.090502e+06 0.025124
```
Zero vs Pattern with the patch
```
zero pattern diff
Os 8.292384e+05 8.338048e+05 0.003333
O3 1.062626e+06 1.067574e+06 0.003193
```
Reviewers: pcc, eugenis
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D63967
llvm-svn: 365858
This patch replaces the three almost identical "strip & accumulate"
implementations for constant pointer offsets with a single one,
combining the respective functionalities. The old interfaces are kept
for now.
Differential Revision: https://reviews.llvm.org/D64468
llvm-svn: 365723
Summary:
We will need to handle IntToPtr which I will submit in a separate patch as it's
not going to be NFC.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D63940
llvm-svn: 365709
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
The `willreturn` function attribute guarantees that a function call will
come back to the call site if the call is also known not to throw.
Therefore, this attribute can be used in
`isGuaranteedToTransferExecutionToSuccessor`.
Patch by Hideto Ueno (@uenoku)
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63372
llvm-svn: 364580
I find the current documentation of poison somewhat confusing,
mainly because its use of "undefined behavior" doesn't seem to
align with our usual interpretation (of immediate UB). Especially
the sentence "any instruction that has a dependence on a poison
value has undefined behavior" is very confusing.
Clarify poison semantics by:
* Replacing the introductory paragraph with the standard rationale
for having poison values.
* Spelling out that instructions depending on poison return poison.
* Spelling out how we go from a poison value to immediate undefined
behavior and give the two examples we currently use in ValueTracking.
* Spelling out that side effects depending on poison are UB.
Differential Revision: https://reviews.llvm.org/D63044
llvm-svn: 363320
I recently got this wrong (again), and I'm sure I'm not the only one. Put a comment in the logical place someone would look to "fix" the obvious "missed optimization" which arrises based on the common misunderstanding. Hopefully, this will save others time. :)
llvm-svn: 363318
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`. Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes. This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.
This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.
It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again. A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.
Reviewers: spatel, nikic
Reviewed By: spatel, nikic
Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62644
llvm-svn: 363274
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).
Differential Revision: https://reviews.llvm.org/D62463
llvm-svn: 361858
The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.
llvm-svn: 361723
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
ConstantRanges have an annoying special case: If upper and lower are
the same, it can be either an empty or a full set. When constructing
constant ranges nearly always a full set is intended, but this still
requires an explicit check in many places.
This revision adds a getNonEmpty() constructor that disambiguates this
case: If upper and lower are the same, a full set is created.
Differential Revision: https://reviews.llvm.org/D60947
llvm-svn: 358854
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.
The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.
Differential Revision: https://reviews.llvm.org/D60668
llvm-svn: 358512
This is a follow-up patch to D60504 to further improve
performance issues in computeKnownBitsFromAssume.
The patch is NFC, but may improve compile-time performance
if the compiler isn't clever enough to do the optimization
itself.
llvm-svn: 358163
This patch changes the order of pattern matching by first testing
a compare instruction's predicate, before doing the pattern
match for the whole expression tree.
Patch by Paul Walker.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D60504
llvm-svn: 358097
This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.
Differential Revision: https://reviews.llvm.org/D60469
llvm-svn: 358020
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).
This (finally...) covers the motivating case from D59071.
Differential Revision: https://reviews.llvm.org/D60420
llvm-svn: 358014
Switch part of the computeOverflowForSignedAdd() implementation to
use Range.isAllNegative() rather than KnownBits.isNegative() and
similar. They do the same thing, but using the ConstantRange methods
allows dropping the KnownBits variables more easily in D60420.
llvm-svn: 357969
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.
Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 357870
This adds ConstantRange::getFull(BitWidth) and
ConstantRange::getEmpty(BitWidth) named constructors as more readable
alternatives to the current ConstantRange(BitWidth, /* full */ false)
and similar. Additionally private getFull() and getEmpty() member
functions are added which return a full/empty range with the same bit
width -- these are commonly needed inside ConstantRange.cpp.
The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet)
constructor is now mandatory for the few usages that still make use of it.
Differential Revision: https://reviews.llvm.org/D59716
llvm-svn: 356852
We're already computing the known bits of the operands here. If the
known bits of the operands can determine the sign bit of the result,
we'll already catch this in signedAddMayOverflow(). The only other
way (and as the comment already indicates) we'll get new information
from computing known bits on the whole add, is if there's an assumption
on it.
As such, we change the code to only compute known bits from assumptions,
instead of computing full known bits on the add (which would unnecessarily
recompute the known bits of the operands as well).
Differential Revision: https://reviews.llvm.org/D59473
llvm-svn: 356785
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.
I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.
Differential Revision: https://reviews.llvm.org/D59617
llvm-svn: 356685
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...
Differential Revision: https://reviews.llvm.org/D59563
llvm-svn: 356586
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.
This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.
The signed case will also be handled in a followup, as it needs some
more groundwork.
Differential Revision: https://reviews.llvm.org/D59386
llvm-svn: 356489
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 356415
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.
Differential Revision: https://reviews.llvm.org/D59511
llvm-svn: 356409
This is the same change as rL356290, but for signed add. It replaces
the existing ripple logic with the overflow logic in ConstantRange.
This is NFC in that it should return NeverOverflow in exactly the
same cases as the previous implementation. However, it does make
computeOverflowForSignedAdd() more powerful by now also determining
AlwaysOverflows conditions. As none of its consumers handle this yet,
this has no impact on optimization. Making use of AlwaysOverflows
in with.overflow folding will be handled as a followup.
Differential Revision: https://reviews.llvm.org/D59450
llvm-svn: 356345
Following the suggestion in D59450, I'm moving the code for constructing
a ConstantRange from KnownBits out of ValueTracking, which also allows us
to test this code independently.
I'm adding this method to ConstantRange rather than KnownBits (which
would have been a bit nicer API wise) to avoid creating a dependency
from Support to IR, where ConstantRange lives.
Differential Revision: https://reviews.llvm.org/D59475
llvm-svn: 356339
Use the methods introduced in rL356276 to implement the
computeOverflowForUnsigned(Add|Sub) functions in ValueTracking, by
converting the KnownBits into a ConstantRange.
This is NFC: The existing KnownBits based implementation uses the same
logic as the the ConstantRange based one. This is not the case for the
signed equivalents, so I'm only changing unsigned here.
This is in preparation for D59386, which will also intersect the
computeConstantRange() result into the range determined from KnownBits.
llvm-svn: 356290
InstructionSimplify currently has some code to determine the constant
range of integer instructions for some simple cases. It is used to
simplify icmps.
This change moves the relevant code into ValueTracking as
llvm::computeConstantRange(), so it can also be reused for other
purposes.
In particular this is with the optimization of overflow checks in
mind (ref D59071), where constant ranges cover some cases that
known bits don't.
llvm-svn: 355781
There are no tests for this case, and I'm not sure how it could ever work,
so I'm just removing this option from the matcher. This should fix PR40940:
https://bugs.llvm.org/show_bug.cgi?id=40940
llvm-svn: 355292
We have two sources of known bits:
1. For adds leading ones of either operand are preserved. For sub
leading zeros of LHS and leading ones of RHS become leading zeros in
the result.
2. The saturating math is a select between add/sub and an all-ones/
zero value. As such we can carry out the add/sub known bits
calculation, and only preseve the known one/zero bits respectively.
Differential Revision: https://reviews.llvm.org/D58329
llvm-svn: 355223
Second part of D58593.
Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.
llvm-svn: 355109
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
Part of D58593.
Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a + b overflows iff a > ~b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and ~b subject to the known bits
constraint.
llvm-svn: 355072
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.
Differential Revision: https://reviews.llvm.org/D54956
llvm-svn: 352293
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :
ByteOffset += GEPOffset.getSExtValue();
The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.
Changing it to
ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());
resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.
See also
https://reviews.llvm.org/D24729https://reviews.llvm.org/D24772
This commit must be merged after D38662 in order for the test to pass.
Patch by Michael Ferguson <mpfergu@gmail.com>.
Reviewers: reames, sanjoy, hfinkel
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D38501
llvm-svn: 350395
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
If the shift amount is known, we can determine the known bits of the
output based on the known bits of two inputs.
This is essentially the same functionality as implemented in D54869,
but for ValueTracking rather than InstCombine SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D55140
llvm-svn: 348091
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.
llvm-svn: 348088
Always-overflow was already determined for unsigned addition, but
not subtraction. This patch establishes parity.
This allows us to perform some additional simplifications for
signed saturating subtractions.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347771
In PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
..we may fail to recognize/simplify fabs() in some cases because we do not
canonicalize fcmp with a -0.0 operand.
Adding that canonicalization can cause regressions on min/max FP tests, so
that's this patch: for the purpose of determining whether something is min/max,
let the value returned by the select determine how we treat a 0.0 operand in the fcmp.
This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so
we don't fail to recognize equivalent min/max patterns that only differ in the
signbit of 0.0.
Differential Revision: https://reviews.llvm.org/D54001
llvm-svn: 346097
This patch gives the IR ComputeNumSignBits the same functionality as the
DAG version (the code is derived from the existing code).
This an extension of the single input shuffle analysis added with D53659.
Differential Revision: https://reviews.llvm.org/D53987
llvm-svn: 346071
The motivating case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The analysis improvement allows us to form a vector 'select' out of
bitwise logic (the use of ComputeNumSignBits was added at rL345149).
The smaller test shows another InstCombine improvement - we use
ComputeNumSignBits to add 'nsw' to shift-left. But the negative
test shows an example where we must not add 'nsw' - when the shuffle
mask contains undef elements.
Differential Revision: https://reviews.llvm.org/D53659
llvm-svn: 345429
Summary:
This CL allows constant vectors of floats to be recognized as non-NaN
and non-zero in select patterns. This change makes
`matchSelectPattern` more powerful generally, but was motivated
specifically because I wanted fminnan and fmaxnan to be created for
vector versions of the scalar patterns they are created for.
Tested with check-all on all targets. A testcase in the WebAssembly
backend that tests the non-nan codepath is in an upcoming CL.
Reviewers: aheejin, dschuff
Subscribers: sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52324
llvm-svn: 343364
Summary:
his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types.
clang part of this patch: D51752
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51751
llvm-svn: 342709
rL340921 has been reverted by rL340923 due to linkage dependency
from Transform/Utils to Analysis which is not allowed. In this patch
this has been fixed, a new utility function moved to Analysis.
Differential Revision: https://reviews.llvm.org/D51152
llvm-svn: 341014
We have multiple places in code where we try to identify whether or not
some instruction is a guard. This patch factors out this logic into a separate
utility function which works uniformly in all places.
Differential Revision: https://reviews.llvm.org/D51152
Reviewed By: fedor.sergeev
llvm-svn: 340921
We need to allow ConstantExpr Selects in addition to SelectInst.
I'll try to put together a test case, but I wanted to fix the issues being reported.
Fixes PR38677
llvm-svn: 340546
If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.
Differential Revision: https://reviews.llvm.org/D51112
llvm-svn: 340480
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 339005
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 338994
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.
Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames
llvm-svn: 338990
This adds the NAN checks suggested in PR37776:
https://bugs.llvm.org/show_bug.cgi?id=37776
If both operands to maxnum are NAN, that should get constant folded, so we don't
have to handle that case. This is the same assumption as other FP ops in this
function. Returning 'false' is always conservatively correct.
Copying from the bug report:
Currently, we have this for "when is cannotBeOrderedLessThanZero
(mustBePositiveOrNaN) true for maxnum":
L
-------------------
| Pos | Neg | NaN |
------------------------
|Pos | x | x | x |
------------------------
R |Neg | x | | x |
------------------------
|NaN | x | x | x |
------------------------
The cases with (Neg & NaN) are wrong. We should have:
L
-------------------
| Pos | Neg | NaN |
------------------------
|Pos | x | x | x |
------------------------
R |Neg | x | | |
------------------------
|NaN | x | | x |
------------------------
Differential Revision: https://reviews.llvm.org/D50081
llvm-svn: 338716
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.
Differential Revision: https://reviews.llvm.org/D49759
llvm-svn: 337936
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613