This boils down to how we deal with early-increment iterator
over function's basic blocks: not only we need to early-increment,
after that we also need to skip all the blocks
that are scheduled for removal, as per DomTreeUpdater.
Thus supporting lazy DomTreeUpdater mode,
where the domtree updates (and thus block removals)
aren't applied immediately, but are delayed
until last possible moment.
When DomTreeUpdater is in lazy update mode, the blocks
that were scheduled to be removed, won't be removed
until the updates are flushed, e.g. by asking
DomTreeUpdater for a up-to-date DomTree.
From the function's current code, it is pretty evident
that the support for the lazy mode is an afterthought,
see e.g. how we roll-back NumRemoved statistic..
So instead of considering all the unreachable blocks
as the blocks-to-be-removed, simply additionally skip
all the blocks that are already scheduled to be removed
When we are adding edges to the terminator and potentially turning it
into a switch (if it wasn't already), it is possible that the
case we're adding will share it's destination with one of the
preexisting cases, in which case there is no domtree edge to add.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
BB was already always branching to EdgeBB, there is no edge to add.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
SI is the terminator of BB, so the edge we are adding obviously already existed.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93747
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640
There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.
Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.
Name: ugt
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ugt i8 %a, C2
=>
%r = icmp slt i8 %x, 0
Name: ult
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ult i4 %a, C2
=>
%r = icmp sgt i4 %x, -1
Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCczhttps://alive2.llvm.org/ce/z/__szVL
Differential Revision: https://reviews.llvm.org/D94014
Please see D93747 for more context which tries to make linkage names of internal
linkage functions to be the uniqueified names. This causes a problem with gdb
because breaking using the demangled function name will not work if the new
uniqueified name cannot be demangled. The problem is the generated suffix which
is a mix of integers and letters which do not demangle. The demangler accepts
either all numbers or all letters. This patch simply converts the hash to decimal.
There is no loss of uniqueness by doing this as the precision is maintained.
The symbol names get longer by a few characters though.
Differential Revision: https://reviews.llvm.org/D94154
The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90909
This patch unifies the way recipes and VPValues are printed after the
transition to VPDef.
VPSlotTracker has been updated to iterate over all recipes and all
their defined values to number those. There is no need to number
values in Value2VPValue.
It also updates a few places that only used slot numbers for
VPInstruction. All recipes now can produce numbered VPValues.
This patch fixes a bug that could result in miscompiles (at least
in an OOT target). The problem could be seen by adding checks that
the DominatorTree used in BasicAliasAnalysis and ValueTracking was
valid (e.g. by adding DT->verify() call before every DT dereference
and then running all tests in test/CodeGen).
Problem was that the LegacyPassManager calculated "last user"
incorrectly for passes such as the DominatorTree when not telling
the pass manager that there was a transitive dependency between
the different analyses. And then it could happen that an incorrect
dominator tree was used when doing alias analysis (which was a pretty
serious bug as the alias analysis result could be invalid).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94138
This patch is part of a series of patches that migrate integer
instruction costs to use InstructionCost. In the function
selectVectorizationFactor I have simply asserted that the cost
is valid and extracted the value as is. In future we expect
to encounter invalid costs, but we should filter out those
vectorization factors that lead to such invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D92178
Loop peeling as a last step triggers loop simplification and this
can change the loop structure. As a result all cashed values like
latch branch becomes invalid.
Patch re-structure the code to take into account the possible
changes caused by peeling.
Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: Meinersbur, fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D93686
This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures in 7c63aac), with the additional restriction added to the transform to only consider outer most loops.
As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better.
Original commit message follows...
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
A severe compile-time slowdown from this call is noted in:
https://llvm.org/PR48689
My naive fix was to put it under LLVM_DEBUG ( 267ff79 ),
but that's not limiting in the way we want.
This is a quick fix (or we could just remove the call completely
and rely on some later pass to discover potentially wrong IR?).
A bigger/better fix would be to improve/limit verifyFunction()
as noted in:
https://llvm.org/PR47712
Differential Revision: https://reviews.llvm.org/D94328
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.
Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.
Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.
This patch also updates a few places to use make use of
make_early_inc_range.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93992
Currently SimplifyCFG drops the debug locations of 'bonus' instructions.
Such instructions are moved before the first branch. The reason for the
current behavior is that this could lead to surprising debug stepping,
if the block that's folded is dead.
In case the first branch and the instructions to be folded have the same
debug location, this shouldn't be an issue and we can keep the debug
location.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D93662
Similar to D92129, update VPWidenPHIRecipe to manage the start value as
VPValue. This allows adjusting the start value as a VPlan transform,
which will be used in a follow-up patch to support reductions during
epilogue vectorization.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D93975
Match the legacy PM in running various ObjC ARC passes.
This requires making some module passes into function passes. These were
initially ported as module passes since they add function declarations
(e.g. https://reviews.llvm.org/D86178), but that's still up for debate
and other passes do so.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D93743
This was suggested to prepare for D93975.
By moving the start value creation to widenPHInstruction, we set the
stage to manage the start value directly in VPWidenPHIRecipe, which be
used subsequently to set the 'resume' value for reductions during
epilogue vectorization.
It also moves RdxDesc to the recipe, so we do not have to rely on Legal
to look it up later.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D94175
As noted in PR48689, the verifier may have some kind
of exponential behavior that should be addressed
separately. For now, only run it in debug mode to
prevent problems for release+asserts.
That limit is what we had before D80401, and I'm
not sure if there was a reason to change it in that
patch.
In the following loop:
void foo(int *a, int *b, int N) {
for (int i=0; i<N; ++i)
a[i + 4] = a[i] + b[i];
}
The loop dependence constrains the VF to a maximum of (4, fixed), which
would mean using <4 x i32> as the vector type in vectorization.
Extending this to scalable vectorization, a VF of (4, scalable) implies
a vector type of <vscale x 4 x i32>. To determine if this is legal
vscale must be taken into account. For this example, unless
max(vscale)=1, it's unsafe to vectorize.
For SVE, the number of bits in an SVE register is architecturally
defined to be a multiple of 128 bits with a maximum of 2048 bits, thus
the maximum vscale is 16. In the loop above it is therefore unfeasible
to vectorize with SVE. However, in this loop:
void foo(int *a, int *b, int N) {
#pragma clang loop vectorize_width(X, scalable)
for (int i=0; i<N; ++i)
a[i + 32] = a[i] + b[i];
}
As long as max(vscale) multiplied by the number of lanes 'X' doesn't
exceed the dependence distance, it is safe to vectorize. For SVE a VF of
(2, scalable) is within this constraint, since a vector of <16 x 2 x 32>
will have no dependencies between lanes. For any number of lanes larger
than this it would be unsafe to vectorize.
This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs
specified as loop hints, implementing the following behaviour:
* If the backend does not support scalable vectors, ignore the hint.
* If scalable vectorization is unfeasible given the loop
dependence, like in the first example above for SVE, then use a
fixed VF.
* Accept scalable VFs if it's safe to do so.
* Otherwise, clamp scalable VFs that exceed the maximum safe VF.
Reviewed By: sdesmalen, fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D91718
The new test case here contains a first order recurrences and an
instruction that is replicated. The first order recurrence forces an
instruction to be sunk _into_, as opposed to after the replication
region. That causes several things to go wrong including registering
vector instructions multiple times and failing to create dominance
relations correctly.
Instead we should be sinking to after the replication region, which is
what this patch makes sure happens.
Differential Revision: https://reviews.llvm.org/D93629
We have modules with metadata on declarations, and out-of-tree passes
use that metadata, and we need to clone those modules. We really expect
such metadata is kept during the clone operation.
Reviewed by: arsenm, aprantl
Differential Revision: https://reviews.llvm.org/D93451
We need to handle this case before dealing with the case of constant
branch condition, because if the destinations match, latter fold
would try to remove the DomTree edge that would still be present.
This allows to make that particular DomTree update non-permissive
Apparently there can be no clones, as happens in
coro-retcon-unreachable.ll.
The alternative is to allow no split functions in
addSplitRefRecursiveFunctions(), but it seems better to have the caller
make sure it's not accidentally splitting no functions out.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D94258
I have added it in d15d81c because it *seemed* correct, was holding
for all the tests so far, and was validating the fix added in the same
commit, but as David Major is pointing out (with a reproducer),
the assertion isn't really correct after all. So remove it.
Note that the d15d81c still fine.
Summary:
Currently SplitEdge does not support passing in parameter which allows you to
name the newly created BasicBlock.
This patch updates the function such that the name of the block can be passed
in, if users of this utility decide to do so.
Reviewed By: Whitney, bmahjour, asbirlea, jamieschmeiser
Differential Revision: https://reviews.llvm.org/D94176
After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.
Reported in D92668
Differential Revision: https://reviews.llvm.org/D93967
Add support for mixed pre/post CFG views.
Update usages of the MemorySSAUpdater to use the new DT API by
requesting the DT updates to be done by the MSSAUpdater.
Differential Revision: https://reviews.llvm.org/D93371
Similar to 5a1d31a28 -
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC (https://reviews.llvm.org/D87798). This is
incorrect since it might be in its own SCC.
Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another
one.
In order to properly support CoroSplit, there are two ways functions can
be split out.
One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.
The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.
This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.
The original hacks in https://reviews.llvm.org/D87798 are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.
This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93828
* Update valueCoversEntireFragment to use TypeSize.
* Add a regression test.
* Assertions have been added to protect untested codepaths.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91806
Currently, LoopDeletion does skip loops that have sub-loops, but this
means we currently fail to remove some no-op loops.
One example are inner loops with live-out values. Those cannot be
removed by itself. But the containing loop may itself be a no-op and the
whole loop-nest can be deleted.
The legality checks do not seem to rely on analyzing inner-loops only
for correctness.
With LoopDeletion being a LoopPass, the change means that we now
unfortunately need to do some extra work in parent loops, by checking
some conditions we already checked. But there appears to be no
noticeable compile time impact:
http://llvm-compile-time-tracker.com/compare.php?from=02d11f3cda2ab5b8bf4fc02639fd1f4b8c45963e&to=843201e9cf3b6871e18c52aede5897a22994c36c&stat=instructions
This changes patch leads to ~10 more loops being deleted on
MultiSource, SPEC2000, SPEC2006 with -O3 & LTO
This patch is also required (together with a few others) to eliminate a
no-op loop in omnetpp as discussed on llvm-dev 'LoopDeletion / removal of
empty loops.' (http://lists.llvm.org/pipermail/llvm-dev/2020-December/147462.html)
This change becomes relevant after removing potentially infinite loops
is made possible in 'must-progress' loops (D86844).
Note that I added a function call with side-effects to an outer loop in
`llvm/test/Transforms/LoopDeletion/update-scev.ll` to preserve the
original spirit of the test.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D93716
This patch updates VPWidenIntOrFpInductionRecipe to hold the start value
for the induction variable. This makes the start value explicit and
allows for adjusting the start value for a VPlan.
The flexibility will be used in further patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92129
This patch adds a new getLiveInIRValue accessor to VPValue, which
returns the underlying value, if the VPValue is defined outside of
VPlan. This is required to handle scalars in VPTransformState, which
requires dealing with scalars defined outside of VPlan.
We can simply check VPValue::Def to determine if the value is defined
inside a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92281
This patch
- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly
With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.
Thanks!
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94053
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94061
If the predecessor is a switch, and BB is not the default destination,
multiple cases could have the same destination. and it doesn't
make sense to re-process the predecessor, because we won't make any changes,
once is enough.
I'm not sure this can be really tested, other than via the assertion
being added here, which fires without the fix.
One would hope that it would have been already canonicalized into an
unconditional branch, but that isn't really guaranteed to happen
with SimplifyCFG's visitation order.
... which requires not removing a DomTree edge if the switch's default
still points at that destination, because it can't be removed;
... and not processing the same predecessor more than once.
A function is noreturn if all blocks terminating with a ReturnInst
contain a call to a noreturn function. Skip looking at naked functions
since there may be asm that returns.
This can be further refined in the future by checking unreachable blocks
and taking into account recursion. It looks like the attributor pass
does this, but that is not yet enabled by default.
This seems to help with code size under the new PM since PruneEH does
not run under the new PM, missing opportunities to mark some functions
noreturn, which in turn doesn't allow simplifycfg to clean up dead code.
https://bugs.llvm.org/show_bug.cgi?id=46858.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93946
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
SLP tries to model 2 forms of vector reductions: pairwise and splitting.
From the cost model code comments, those are defined using an example as:
/// Pairwise:
/// (v0, v1, v2, v3)
/// ((v0+v1), (v2+v3), undef, undef)
/// Split:
/// (v0, v1, v2, v3)
/// ((v0+v2), (v1+v3), undef, undef)
I don't know the full history of this functionality, but it was partly
added back in D29402. There are apparently no users at this point (no
regression tests change). X86 might have managed to work-around the need
for this through cost model and codegen improvements.
Removing this code makes it easier to continue the work that was started
in D87416 / D88193. The alternative -- if there is some target that is
silently using this option -- is to move this logic into LoopUtils. We
have related/duplicate functionality there via llvm::createTargetReduction().
Differential Revision: https://reviews.llvm.org/D93860
Creating in-loop reductions relies on IR references to map
IR values to VPValues after interleave group creation.
Make sure we re-add the updated member to the plan, so the look-ups
still work as expected
This fixes a crash reported after D90562.
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
Loop strength reduction tries to recover debug variable values by looking
for simple offsets from PHI values. In really extreme conditions there may
be an offset used that won't fit in an int64_t, hitting an APInt assertion.
This patch adds a regression test and adjusts the equivalent value
collecting code to filter out any values where the offset can't be
represented by an int64_t. This means that for very large integers with
very large offsets, the variable location will become undef, which is the
same behaviour as before 2a6782bb9f / D87494.
Differential Revision: https://reviews.llvm.org/D94016
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
... which requires not deleting an edge that just got deleted,
because we could be dealing with a block that didn't go through
ConstantFoldTerminator() yet, and thus has a degenerate cond br
with matching true/false destinations.
Notably, this doesn't switch *every* case, remaining cases
don't actually pass sanity checks in non-permissve mode,
and therefore require further analysis.
Note that SimplifyCFG still defaults to not preserving DomTree by default,
so this is effectively a NFC change.
While here, rename the inaccurate getRecurrenceBinOp()
because that was also used to get CmpInst opcodes.
The recurrence/reduction kind should always refer to the
expected opcode for a reduction. SLP appears to be the
only direct caller of createSimpleTargetReduction(), and
that calling code ideally should not be carrying around
both an opcode and a reduction kind.
This should allow us to generalize reduction matching to
use intrinsics instead of only binops.
Allow loop nests with empty basic blocks without loops in different
levels as perfect.
Reviewers: Meinersbur
Differential Revision: https://reviews.llvm.org/D93665
This reverts commit dd6bb367d1.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.
This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.
Unfortunately there's no easy way to add a test for the change I think.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93684
Currently ArgPromotion removes dead GEPs as part of the legality check
in isSafeToPromoteArgument. If no promotion happens, this means the pass
claims no modifications happened, even though GEPs were removed.
This patch fixes the issue by delaying removal of dead GEPs until
doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and
the code in doPromotion dealing with GEPs is updated to account for
dead GEPs. Once we committed to promotion, it should be safe to
remove dead GEPs.
Alternatively isSafeToPromoteArgument could return an additional boolean
to indicate whether it made changes, but this is quite cumbersome and
there should be no real benefit of weeding out some dead GEPs here if we
do not perform promotion.
I added a test for the case where dead GEPs need to be removed when
promotion happens in 578c5a0c6e.
Fixes PR47477.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93991
This is NFC since SimplifyCFG still currently defaults to not preserving DomTree.
SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(),
and can not be called externally, since SimplifyCFGOpt is defined in .cpp
This avoids some needless verifications, and is thus a bit faster
without sacrificing precision.
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
This patch makes Scalarizer to use poison as insertelement's placeholder.
It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program.
It is because the placeholder value (poison) is already completely hidden by following insertelement instructions.
The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times,
which is # of elems of MidTy.
The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with
Count insertelements.
The test diffs show that the poison value is never exposed after insertelements.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93989
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.
Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.
The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.
The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
If DoExtraAnalysis is true (e.g. because remarks are enabled), we
continue with the analysis rather than exiting. Update code to
conditionally check if the ExitBB has phis or not a single predecessor.
Otherwise a nullptr is dereferenced with DoExtraAnalysis.
This pretty much concludes patch series for updating SimplifyCFG
to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass
with `-simplifycfg-require-and-preserve-domtree=1`.
There are a few leftovers that apparently don't have good test coverage.
I do not yet know what gaps in test coverage will the wider-scale testing
reveal, but the default flip might be close.
When combining extracted functions, they may have different function
attributes. We want to make sure that we do not make any assumptions,
or lose any information. This attempts to make sure that we consolidate
function attributes to their most general case.
Tests:
llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll
llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll
Reviewers: jdoefert, paquette
Differential Revision: https://reviews.llvm.org/D87301
The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is
error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D93959
Test clang/test/Misc/loop-opt-setup.c fails when executed in Release.
This reverts commit 6f1503d598.
Reviewed By: SureYeaah
Differential Revision: https://reviews.llvm.org/D93956
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
I don't know if there's some way this changes what the vectorizers
may produce for reductions, but I have added test coverage with
3567908 and 5ced712 to show that both passes already have bugs in
this area. Hopefully this does not make things worse before we can
really fix it.
There are functions that the linker is able to automatically
deduplicate, we do not outline from these functions by default. This
allows for outlining from those functions.
Tests:
llvm/test/Transforms/IROutliner/outlining-odr.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87309
This is no-functional-change-intended (AFAIK, we can't
isolate this difference in a regression test).
That's because the callers should be setting the IRBuilder's
FMF field when creating the reduction and/or setting those
flags after creating. It doesn't make sense to override this
one flag alone.
This is part of a multi-step process to clean up the FMF
setting/propagation. See PR35538 for an example.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This patch adds support for select form of and/or.
Currently there is an ongoing effort for moving towards using `select a, b, false` instead of `and i1 a, b` and
`select a, true, b` instead of `or i1 a, b` as well.
D93065 has links to relevant changes.
Alive2 proof: (undef input was disabled due to timeout :( )
- and: https://alive2.llvm.org/ce/z/AgvFbQ
- or: https://alive2.llvm.org/ce/z/KjLJyb
Differential Revision: https://reviews.llvm.org/D93935
Since some values can be swift errors, we need to make sure that we
correctly propagate the parameter attributes.
Tests found at:
llvm/test/Transforms/IROutliner/outlining-swift-error.ll
Reviewers: jroelofs, paquette
Recommit of: 71867ed5e6
Differential Revision: https://reviews.llvm.org/D87742
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.
Differential Revision: https://reviews.llvm.org/D91927
This prints OptRemarks at each location where a decision is made to not
outline, or to outline a specific section for the IROutliner pass.
Test:
llvm/test/Transforms/IROutliner/opt-remarks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87300
The switch duplicated the translation in getRecurrenceBinOp().
This code is still weird because it translates to the TTI
ReductionFlags for min/max, but then createSimpleTargetReduction()
converts that back to RecurrenceDescriptor::MinMaxRecurrenceKind.
I'm not sure if the SLP enum was created before the IVDescriptor
RecurrenceDescriptor / RecurrenceKind existed, but the code in
SLP is now redundant with that class, so it just makes things
more complicated to have both. We eventually call LoopUtils
createSimpleTargetReduction() to create reduction ops, so we
might as well standardize on those enum names.
There's still a question of whether we need to use TTI::ReductionFlags
vs. MinMaxRecurrenceKind, but that can be another clean-up step.
Another option would just be to flatten the enums in RecurrenceDescriptor
into a single enum. There isn't much benefit (smaller switches?) to
having a min/max subset.
This adds a cost model that takes into account the total number of
machine instructions to be removed from each region, the number of
instructions added by adding a new function with a set of instructions,
and the instructions added by handling arguments.
Tests not adding flags:
llvm/test/Transforms/IROutliner/outlining-cost-model.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87299
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.
This transform has resulted in a surprising number of constantexpr failures.
Many of the sets of output stores will be the same. When a block is
created, we check if there is an output block with the same set of store
instructions. If there is, we map the output block of the region back
to the block, so that the extra argument controlling the switch
statement can be set to the appropriate block value.
Tests:
- llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87298
Certain regions can have values introduced inside the region that are
used outside of the region. These may not be the same for each similar
region, so we must create one over arching set of arguments for the
consolidated function.
We do this by iterating over the outputs for each extracted function,
and creating as many different arguments to encapsulate the different
outputs sets. For each output set, we create a different block with the
necessary stores from the value to the output register. There is then
one switch statement, controlled by an argument to the function, to
differentiate which block to use.
Changed Tests for consistency:
llvm/test/Transforms/IROutliner/extraction.ll
llvm/test/Transforms/IROutliner/illegal-assumes.ll
llvm/test/Transforms/IROutliner/illegal-memcpy.ll
llvm/test/Transforms/IROutliner/illegal-memmove.ll
llvm/test/Transforms/IROutliner/illegal-vaarg.ll
Tests to test new functionality:
llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll
llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll
llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87296
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.
This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.
The basic idea here is the same as D93065, but keeps the change
behind a flag for now.
Differential Revision: https://reviews.llvm.org/D93840
We might be dealing with an unreachable code,
so the bonus instruction we clone might be self-referencing.
There is a sanity check that all uses of bonus instructions
that are not in the original block with said bonus instructions
are PHI nodes, and that is obviously not the case
for self-referencing instructions..
So if we find such an use, just rewrite it.
Thanks to Mikael Holmén for the reproducer!
Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8
This reverts commit 4ffcd4fe9a thus restoring e4df6a40da.
The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.
This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:
single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)
The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.
The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.
The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).
Differential Revision: https://reviews.llvm.org/D93317
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
The function FoldSingleEntryPHINodes() is changed to return if
it has changed IR or not. This return value is used by RS4GC to
set the MadeChange flag respectively.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D93810
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:
```
; https://bugs.llvm.org/show_bug.cgi?id=44185
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%xv = insertelement <4 x float> %q, float %x, i32 2
%r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%r = insertelement <4 x float> %y, float %x, i32 1
ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source
```
I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef
Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.
The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(
Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.
Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93586
This patch updates GVN to correctly return the modified status, if PRE
is performed on indices. It fixes a crash when building the test-suite
with EXPENSIVE_CHECKS and LTO.
EarlyCSE's handleBranchCondition says:
```
// If the condition is AND operation, we can propagate its operands into the
// true branch. If it is OR operation, we can propagate them into the false
// branch.
```
This holds for the corresponding select patterns as well.
This is a part of an ongoing work for disabling buggy select->and/or transformations.
See llvm.org/pr48353 and D93065 for more context
Proof:
and: https://alive2.llvm.org/ce/z/MQWodU
or: https://alive2.llvm.org/ce/z/9GLbB_
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93842
This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2`
branch condition and propagate equality from these.
See llvm.org/pr48353, D93065
Differential Revision: https://reviews.llvm.org/D93841
Previously the branch from the middle block to the scalar preheader & exit
was being set-up at the end of skeleton creation in completeLoopSkeleton.
Inserting SCEV or runtime checks may result in LCSSA phis being created,
if they are required. Adjusting branches afterwards may break those
PHIs.
To avoid this, we can instead create the branch from the middle block
to the exit after we created the middle block, so we have the final CFG
before potentially adjusting/creating PHIs.
This fixes a crash for the included test case. For the non-crashing
case, this is almost a NFC with respect to the generated code. The
only change is the order of the predecessors of the involved branch
targets.
Note an assertion was moved from LoopVersioning() to
LoopVersioning::versionLoop. Adjusting the branches means loop-simplify
form may be broken before constructing LoopVersioning. But LV only uses
LoopVersioning to annotate the loop instructions with !noalias metadata,
which does not require loop-simplify form.
This is a fix for an existing issue uncovered by D93317.
I am hoping to extend the reduction matching code, and it is
hard to distinguish "ReductionData" from "ReducedValueData".
So extend the tree/root metaphor to include leaves.
Another problem is that the name "OperationData" does not
provide insight into its purpose. I'm not sure if we can alter
that underlying data structure to make the code clearer.
While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount`
will always be smaller than `bitwidth(X)`, i.e. we never get poison,
rewriting `%x.next` is more complicated, however, because `X << LoopTripCount`
will be poison iff `LoopTripCount == bitwidth(X)` (which will happen
iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`).
So unless we know that isn't the case (as alive2 notes, we know it's safe
to do iff shift had no-wrap flags, or bitpos does not indicate signbit,
or we know that %x is never `1`), we'll need to emit an alternative,
safe IR, by either just shifting the `%x.curr`, or conditionally selecting
between the computed `%x.next` and `0`..
Former IR looks better so let's do that.
While there, ensure that we don't drop no-wrap flags from said shift.
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.
This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.
`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
This reverts commit 899faa50f2.
Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.
https://bugs.llvm.org/show_bug.cgi?id=48577#c3
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.