Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.
This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.
Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.
Differential Revision: https://reviews.llvm.org/D110620
Reviewed By: reames
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
While it is not a very likely scenario, we probably should not expect
that instcombine already dropped such a redundant zext,
but handle directly. Moreover, perhaps there was no ZExtInst,
and SCEV somehow managed to pull out said zext out of the SCEV expression.
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
Extra leading zeros do not affect the result of comparison with zero,
nor do they matter for the unsigned min/max,
so we should not be dissuaded when we find a zero-extensions,
but instead we should just skip it.
In a prior review I was asked to move the helper function canIgnoreSNaN()
out to FPEnv.h. This wasn't possible at the time because that function
needs the fast math flags, and including them includes lots of other stuff
that isn't needed.
This patch moves the fast math flags out into a new FMF.h file unchanged,
and moves the helper function out to FPEnv.h also unchanged. This ticket
only moves code around.
Differential Revision: https://reviews.llvm.org/D119752
Currently the fsub optimizations in InstSimplify don't know how to fold
X - -0.0 to X when we know X is not zero and the constrained intrinsics
are used. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D119746
This one tries to fix:
https://github.com/llvm/llvm-project/issues/53357.
Simply, this one would check (x & y) and ~(x | y) in
haveNoCommonBitsSet. Since they shouldn't have common bits (we could
traverse the case by enumerating), and we could convert this one to (x &
y) | ~(x | y) . Then the compiler could handle it in
InstCombineAndOrXor.
Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y)
+ ~(x | y)), this patch would fix it too.
https://alive2.llvm.org/ce/z/qsKzRS
Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri
Differential Revision: https://reviews.llvm.org/D118094
Volatile store does not provide any special rules for reordering with
atomics. Usual must alias anaylsis is enough here.
This makes the bahavior similar to how volatile load is handled.
Reviewers: reames, nikic
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119818
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
Instead of doing an inbounds strip first and another non-inbounds
strip afterward for equality comparisons, directly do a single
inbounds or non-inbounds strip based on whether we have an equality
predicate or not.
This is NFC-ish in that the alloca equality codepath is the only
part that sees additional non-inbounds offsets now, and for that
codepath it doesn't matter whether or not the GEP is inbounds, as
it does a stronger check itself. InstCombine would infer inbounds
for such GEPs.
Currently the fsub optimizations in InstSimplify don't know how to fold X
- +0.0 to X when using the constrained intrinsics. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D118928
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119652
This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression. As these sets are small and agressively deduped, this has little value.
Even if the search is marked as terminated after only looking at
the first operand, we'd still look at the remaining operands
before actually ending the search.
This seems pointless and wasteful, let's not do that.
Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`,
just looking at the operands of the outer-level `umin` is not sufficient,
and we need to recurse into all same-typed `umin`'s.
This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change. If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.
SCCP requires that the load/store type and global type are the
same (it does not support bitcasts of tracked globals). With
typed pointers this was implicitly enforced.
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.
Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.
For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.
https://alive2.llvm.org/ce/z/ULuZxB
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/aKAr94
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/SMEaoc
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
The code was relying upon the implicit conversion of TypeSize to
uint64_t and assuming the type in question was always fixed. However,
I discovered an issue when running the canon-freeze pass with some
IR loops that contains scalable vector types. I've changed the code
to bail out if the size is unknown at compile time, since we cannot
compute whether the step is a multiple of the type size or not.
I added a test here:
Transforms/CanonicalizeFreezeInLoops/phis.ll
Differential Revision: https://reviews.llvm.org/D118696
This is the last major stepping stone before being able to allocate the node via the folding set allocator. That will in turn allow more general SCEV predicate expression trees.
For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.
With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.
Also some other minor getPointerElementType() removals.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119047