I have added it in d15d81c because it *seemed* correct, was holding
for all the tests so far, and was validating the fix added in the same
commit, but as David Major is pointing out (with a reproducer),
the assertion isn't really correct after all. So remove it.
Note that the d15d81c still fine.
Summary:
Currently SplitEdge does not support passing in parameter which allows you to
name the newly created BasicBlock.
This patch updates the function such that the name of the block can be passed
in, if users of this utility decide to do so.
Reviewed By: Whitney, bmahjour, asbirlea, jamieschmeiser
Differential Revision: https://reviews.llvm.org/D94176
After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.
Reported in D92668
Differential Revision: https://reviews.llvm.org/D93967
Add support for mixed pre/post CFG views.
Update usages of the MemorySSAUpdater to use the new DT API by
requesting the DT updates to be done by the MSSAUpdater.
Differential Revision: https://reviews.llvm.org/D93371
Similar to 5a1d31a28 -
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC (https://reviews.llvm.org/D87798). This is
incorrect since it might be in its own SCC.
Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another
one.
In order to properly support CoroSplit, there are two ways functions can
be split out.
One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.
The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.
This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.
The original hacks in https://reviews.llvm.org/D87798 are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.
This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93828
* Update valueCoversEntireFragment to use TypeSize.
* Add a regression test.
* Assertions have been added to protect untested codepaths.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91806
Currently, LoopDeletion does skip loops that have sub-loops, but this
means we currently fail to remove some no-op loops.
One example are inner loops with live-out values. Those cannot be
removed by itself. But the containing loop may itself be a no-op and the
whole loop-nest can be deleted.
The legality checks do not seem to rely on analyzing inner-loops only
for correctness.
With LoopDeletion being a LoopPass, the change means that we now
unfortunately need to do some extra work in parent loops, by checking
some conditions we already checked. But there appears to be no
noticeable compile time impact:
http://llvm-compile-time-tracker.com/compare.php?from=02d11f3cda2ab5b8bf4fc02639fd1f4b8c45963e&to=843201e9cf3b6871e18c52aede5897a22994c36c&stat=instructions
This changes patch leads to ~10 more loops being deleted on
MultiSource, SPEC2000, SPEC2006 with -O3 & LTO
This patch is also required (together with a few others) to eliminate a
no-op loop in omnetpp as discussed on llvm-dev 'LoopDeletion / removal of
empty loops.' (http://lists.llvm.org/pipermail/llvm-dev/2020-December/147462.html)
This change becomes relevant after removing potentially infinite loops
is made possible in 'must-progress' loops (D86844).
Note that I added a function call with side-effects to an outer loop in
`llvm/test/Transforms/LoopDeletion/update-scev.ll` to preserve the
original spirit of the test.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D93716
This patch updates VPWidenIntOrFpInductionRecipe to hold the start value
for the induction variable. This makes the start value explicit and
allows for adjusting the start value for a VPlan.
The flexibility will be used in further patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92129
This patch adds a new getLiveInIRValue accessor to VPValue, which
returns the underlying value, if the VPValue is defined outside of
VPlan. This is required to handle scalars in VPTransformState, which
requires dealing with scalars defined outside of VPlan.
We can simply check VPValue::Def to determine if the value is defined
inside a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92281
This patch
- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly
With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.
Thanks!
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94053
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94061
If the predecessor is a switch, and BB is not the default destination,
multiple cases could have the same destination. and it doesn't
make sense to re-process the predecessor, because we won't make any changes,
once is enough.
I'm not sure this can be really tested, other than via the assertion
being added here, which fires without the fix.
One would hope that it would have been already canonicalized into an
unconditional branch, but that isn't really guaranteed to happen
with SimplifyCFG's visitation order.
... which requires not removing a DomTree edge if the switch's default
still points at that destination, because it can't be removed;
... and not processing the same predecessor more than once.
A function is noreturn if all blocks terminating with a ReturnInst
contain a call to a noreturn function. Skip looking at naked functions
since there may be asm that returns.
This can be further refined in the future by checking unreachable blocks
and taking into account recursion. It looks like the attributor pass
does this, but that is not yet enabled by default.
This seems to help with code size under the new PM since PruneEH does
not run under the new PM, missing opportunities to mark some functions
noreturn, which in turn doesn't allow simplifycfg to clean up dead code.
https://bugs.llvm.org/show_bug.cgi?id=46858.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93946
This should be no-functional-change because the reduction kind
opcodes are 1-for-1 mappings to the instructions we are matching
as reductions. But we want to remove the need for the
`OperationData` opcode field because that does not work when
we start matching intrinsics (eg, maxnum) as reduction candidates.
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
SLP tries to model 2 forms of vector reductions: pairwise and splitting.
From the cost model code comments, those are defined using an example as:
/// Pairwise:
/// (v0, v1, v2, v3)
/// ((v0+v1), (v2+v3), undef, undef)
/// Split:
/// (v0, v1, v2, v3)
/// ((v0+v2), (v1+v3), undef, undef)
I don't know the full history of this functionality, but it was partly
added back in D29402. There are apparently no users at this point (no
regression tests change). X86 might have managed to work-around the need
for this through cost model and codegen improvements.
Removing this code makes it easier to continue the work that was started
in D87416 / D88193. The alternative -- if there is some target that is
silently using this option -- is to move this logic into LoopUtils. We
have related/duplicate functionality there via llvm::createTargetReduction().
Differential Revision: https://reviews.llvm.org/D93860
Creating in-loop reductions relies on IR references to map
IR values to VPValues after interleave group creation.
Make sure we re-add the updated member to the plan, so the look-ups
still work as expected
This fixes a crash reported after D90562.
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
Loop strength reduction tries to recover debug variable values by looking
for simple offsets from PHI values. In really extreme conditions there may
be an offset used that won't fit in an int64_t, hitting an APInt assertion.
This patch adds a regression test and adjusts the equivalent value
collecting code to filter out any values where the offset can't be
represented by an int64_t. This means that for very large integers with
very large offsets, the variable location will become undef, which is the
same behaviour as before 2a6782bb9f / D87494.
Differential Revision: https://reviews.llvm.org/D94016
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
... which requires not deleting an edge that just got deleted,
because we could be dealing with a block that didn't go through
ConstantFoldTerminator() yet, and thus has a degenerate cond br
with matching true/false destinations.
Notably, this doesn't switch *every* case, remaining cases
don't actually pass sanity checks in non-permissve mode,
and therefore require further analysis.
Note that SimplifyCFG still defaults to not preserving DomTree by default,
so this is effectively a NFC change.
While here, rename the inaccurate getRecurrenceBinOp()
because that was also used to get CmpInst opcodes.
The recurrence/reduction kind should always refer to the
expected opcode for a reduction. SLP appears to be the
only direct caller of createSimpleTargetReduction(), and
that calling code ideally should not be carrying around
both an opcode and a reduction kind.
This should allow us to generalize reduction matching to
use intrinsics instead of only binops.
Allow loop nests with empty basic blocks without loops in different
levels as perfect.
Reviewers: Meinersbur
Differential Revision: https://reviews.llvm.org/D93665
This reverts commit dd6bb367d1.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.
This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.
Unfortunately there's no easy way to add a test for the change I think.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93684
Currently ArgPromotion removes dead GEPs as part of the legality check
in isSafeToPromoteArgument. If no promotion happens, this means the pass
claims no modifications happened, even though GEPs were removed.
This patch fixes the issue by delaying removal of dead GEPs until
doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and
the code in doPromotion dealing with GEPs is updated to account for
dead GEPs. Once we committed to promotion, it should be safe to
remove dead GEPs.
Alternatively isSafeToPromoteArgument could return an additional boolean
to indicate whether it made changes, but this is quite cumbersome and
there should be no real benefit of weeding out some dead GEPs here if we
do not perform promotion.
I added a test for the case where dead GEPs need to be removed when
promotion happens in 578c5a0c6e.
Fixes PR47477.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93991
This is NFC since SimplifyCFG still currently defaults to not preserving DomTree.
SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(),
and can not be called externally, since SimplifyCFGOpt is defined in .cpp
This avoids some needless verifications, and is thus a bit faster
without sacrificing precision.
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
This patch makes Scalarizer to use poison as insertelement's placeholder.
It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program.
It is because the placeholder value (poison) is already completely hidden by following insertelement instructions.
The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times,
which is # of elems of MidTy.
The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with
Count insertelements.
The test diffs show that the poison value is never exposed after insertelements.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93989
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.
Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
This is almost all mechanical search-and-replace and
no-functional-change-intended (NFC). Having a single
enum makes it easier to match/reason about the
reduction cases.
The goal is to remove `Opcode` from reduction matching
code in the vectorizers because that makes it harder to
adapt the code to handle intrinsics.
The code in RecurrenceDescriptor::AddReductionVar() is
the only place that required closer inspection. It uses
a RecurrenceDescriptor and a second InstDesc to sometimes
overwrite part of the struct. It seem like we should be
able to simplify that logic, but it's not clear exactly
which cmp+sel patterns that we are trying to handle/avoid.
If DoExtraAnalysis is true (e.g. because remarks are enabled), we
continue with the analysis rather than exiting. Update code to
conditionally check if the ExitBB has phis or not a single predecessor.
Otherwise a nullptr is dereferenced with DoExtraAnalysis.
This pretty much concludes patch series for updating SimplifyCFG
to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass
with `-simplifycfg-require-and-preserve-domtree=1`.
There are a few leftovers that apparently don't have good test coverage.
I do not yet know what gaps in test coverage will the wider-scale testing
reveal, but the default flip might be close.
When combining extracted functions, they may have different function
attributes. We want to make sure that we do not make any assumptions,
or lose any information. This attempts to make sure that we consolidate
function attributes to their most general case.
Tests:
llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll
llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll
Reviewers: jdoefert, paquette
Differential Revision: https://reviews.llvm.org/D87301
The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is
error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D93959
Test clang/test/Misc/loop-opt-setup.c fails when executed in Release.
This reverts commit 6f1503d598.
Reviewed By: SureYeaah
Differential Revision: https://reviews.llvm.org/D93956
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
I don't know if there's some way this changes what the vectorizers
may produce for reductions, but I have added test coverage with
3567908 and 5ced712 to show that both passes already have bugs in
this area. Hopefully this does not make things worse before we can
really fix it.
There are functions that the linker is able to automatically
deduplicate, we do not outline from these functions by default. This
allows for outlining from those functions.
Tests:
llvm/test/Transforms/IROutliner/outlining-odr.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87309
This is no-functional-change-intended (AFAIK, we can't
isolate this difference in a regression test).
That's because the callers should be setting the IRBuilder's
FMF field when creating the reduction and/or setting those
flags after creating. It doesn't make sense to override this
one flag alone.
This is part of a multi-step process to clean up the FMF
setting/propagation. See PR35538 for an example.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This patch adds support for select form of and/or.
Currently there is an ongoing effort for moving towards using `select a, b, false` instead of `and i1 a, b` and
`select a, true, b` instead of `or i1 a, b` as well.
D93065 has links to relevant changes.
Alive2 proof: (undef input was disabled due to timeout :( )
- and: https://alive2.llvm.org/ce/z/AgvFbQ
- or: https://alive2.llvm.org/ce/z/KjLJyb
Differential Revision: https://reviews.llvm.org/D93935
Since some values can be swift errors, we need to make sure that we
correctly propagate the parameter attributes.
Tests found at:
llvm/test/Transforms/IROutliner/outlining-swift-error.ll
Reviewers: jroelofs, paquette
Recommit of: 71867ed5e6
Differential Revision: https://reviews.llvm.org/D87742
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.
Differential Revision: https://reviews.llvm.org/D91927
This prints OptRemarks at each location where a decision is made to not
outline, or to outline a specific section for the IROutliner pass.
Test:
llvm/test/Transforms/IROutliner/opt-remarks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87300
The switch duplicated the translation in getRecurrenceBinOp().
This code is still weird because it translates to the TTI
ReductionFlags for min/max, but then createSimpleTargetReduction()
converts that back to RecurrenceDescriptor::MinMaxRecurrenceKind.
I'm not sure if the SLP enum was created before the IVDescriptor
RecurrenceDescriptor / RecurrenceKind existed, but the code in
SLP is now redundant with that class, so it just makes things
more complicated to have both. We eventually call LoopUtils
createSimpleTargetReduction() to create reduction ops, so we
might as well standardize on those enum names.
There's still a question of whether we need to use TTI::ReductionFlags
vs. MinMaxRecurrenceKind, but that can be another clean-up step.
Another option would just be to flatten the enums in RecurrenceDescriptor
into a single enum. There isn't much benefit (smaller switches?) to
having a min/max subset.
This adds a cost model that takes into account the total number of
machine instructions to be removed from each region, the number of
instructions added by adding a new function with a set of instructions,
and the instructions added by handling arguments.
Tests not adding flags:
llvm/test/Transforms/IROutliner/outlining-cost-model.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87299
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.
This transform has resulted in a surprising number of constantexpr failures.
Many of the sets of output stores will be the same. When a block is
created, we check if there is an output block with the same set of store
instructions. If there is, we map the output block of the region back
to the block, so that the extra argument controlling the switch
statement can be set to the appropriate block value.
Tests:
- llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87298
Certain regions can have values introduced inside the region that are
used outside of the region. These may not be the same for each similar
region, so we must create one over arching set of arguments for the
consolidated function.
We do this by iterating over the outputs for each extracted function,
and creating as many different arguments to encapsulate the different
outputs sets. For each output set, we create a different block with the
necessary stores from the value to the output register. There is then
one switch statement, controlled by an argument to the function, to
differentiate which block to use.
Changed Tests for consistency:
llvm/test/Transforms/IROutliner/extraction.ll
llvm/test/Transforms/IROutliner/illegal-assumes.ll
llvm/test/Transforms/IROutliner/illegal-memcpy.ll
llvm/test/Transforms/IROutliner/illegal-memmove.ll
llvm/test/Transforms/IROutliner/illegal-vaarg.ll
Tests to test new functionality:
llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll
llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll
llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87296
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.
This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.
The basic idea here is the same as D93065, but keeps the change
behind a flag for now.
Differential Revision: https://reviews.llvm.org/D93840
We might be dealing with an unreachable code,
so the bonus instruction we clone might be self-referencing.
There is a sanity check that all uses of bonus instructions
that are not in the original block with said bonus instructions
are PHI nodes, and that is obviously not the case
for self-referencing instructions..
So if we find such an use, just rewrite it.
Thanks to Mikael Holmén for the reproducer!
Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8
This reverts commit 4ffcd4fe9a thus restoring e4df6a40da.
The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.
This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:
single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)
The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.
The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.
The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).
Differential Revision: https://reviews.llvm.org/D93317
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
The function FoldSingleEntryPHINodes() is changed to return if
it has changed IR or not. This return value is used by RS4GC to
set the MadeChange flag respectively.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D93810
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:
```
; https://bugs.llvm.org/show_bug.cgi?id=44185
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%xv = insertelement <4 x float> %q, float %x, i32 2
%r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%r = insertelement <4 x float> %y, float %x, i32 1
ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source
```
I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef
Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.
The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(
Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.
Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93586
This patch updates GVN to correctly return the modified status, if PRE
is performed on indices. It fixes a crash when building the test-suite
with EXPENSIVE_CHECKS and LTO.
EarlyCSE's handleBranchCondition says:
```
// If the condition is AND operation, we can propagate its operands into the
// true branch. If it is OR operation, we can propagate them into the false
// branch.
```
This holds for the corresponding select patterns as well.
This is a part of an ongoing work for disabling buggy select->and/or transformations.
See llvm.org/pr48353 and D93065 for more context
Proof:
and: https://alive2.llvm.org/ce/z/MQWodU
or: https://alive2.llvm.org/ce/z/9GLbB_
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93842
This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2`
branch condition and propagate equality from these.
See llvm.org/pr48353, D93065
Differential Revision: https://reviews.llvm.org/D93841
Previously the branch from the middle block to the scalar preheader & exit
was being set-up at the end of skeleton creation in completeLoopSkeleton.
Inserting SCEV or runtime checks may result in LCSSA phis being created,
if they are required. Adjusting branches afterwards may break those
PHIs.
To avoid this, we can instead create the branch from the middle block
to the exit after we created the middle block, so we have the final CFG
before potentially adjusting/creating PHIs.
This fixes a crash for the included test case. For the non-crashing
case, this is almost a NFC with respect to the generated code. The
only change is the order of the predecessors of the involved branch
targets.
Note an assertion was moved from LoopVersioning() to
LoopVersioning::versionLoop. Adjusting the branches means loop-simplify
form may be broken before constructing LoopVersioning. But LV only uses
LoopVersioning to annotate the loop instructions with !noalias metadata,
which does not require loop-simplify form.
This is a fix for an existing issue uncovered by D93317.
I am hoping to extend the reduction matching code, and it is
hard to distinguish "ReductionData" from "ReducedValueData".
So extend the tree/root metaphor to include leaves.
Another problem is that the name "OperationData" does not
provide insight into its purpose. I'm not sure if we can alter
that underlying data structure to make the code clearer.
While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount`
will always be smaller than `bitwidth(X)`, i.e. we never get poison,
rewriting `%x.next` is more complicated, however, because `X << LoopTripCount`
will be poison iff `LoopTripCount == bitwidth(X)` (which will happen
iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`).
So unless we know that isn't the case (as alive2 notes, we know it's safe
to do iff shift had no-wrap flags, or bitpos does not indicate signbit,
or we know that %x is never `1`), we'll need to emit an alternative,
safe IR, by either just shifting the `%x.curr`, or conditionally selecting
between the computed `%x.next` and `0`..
Former IR looks better so let's do that.
While there, ensure that we don't drop no-wrap flags from said shift.
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.
This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.
`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
This reverts commit 899faa50f2.
Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.
https://bugs.llvm.org/show_bug.cgi?id=48577#c3
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:
> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.
While this is a minimal fix, the GEP of null handling should
probably be its own fold.
The current state of the transform is still not enough to support
my motivational pattern, because it has one more "induction variable".
I have delayed posting this patch, because originally even just rewriting
the loop as countable wasn't enough to nicely transform my motivational pattern,
because i expected that extra IV to be rewritten afterwards,
but it wasn't happening until i fixed that in D91800.
So, this patch allows the 'left-shift until bittest' loop idiom
as long as the inserted ops are cheap,
and lifts any and all extra use checks on the instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D92754
If the bitmask is for sign bit, instcombine would have canonicalized
the pattern into a proper sign bit check. Supporting that is still
simple, but requires a bit of a roundtrip - we first have to use
`decomposeBitTestICmp()`, and the rest again just works.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91726
The handing of the case where the mask is a constant is trivial,
if said constant is a power of two, the bit in question is log2(mask),
rest just works.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91725
The motivation here is the following inner loop in fp16/fp24 -> fp32 expander,
that runs as part of the floating-point DNG decompression in RawSpeed library:
cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)
```
while (!(fp32_fraction & (1 << 23))) {
fp32_exponent -= 1;
fp32_fraction <<= 1;
}
```
(https://godbolt.org/z/r13YMh)
As one might notice, that loop is currently uncountable, and that whole code stays scalar.
Yet, it is rather trivial to make that loop countable:
https://godbolt.org/z/do8WMz
and we can prove that via alive2:
https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?)
... and that allow for the whole fp16->fp32 code to vectorize:
https://godbolt.org/z/7hYr13
Now, while i'd love to get there, i feel like i should take it in steps.
For now, this introduces support for the most basic case,
where the bit position is known as a variable,
and the loop *will* go away (has no live-outs other than the recurrence,
no extra instructions in the loop).
I have added sufficient (i believe) test coverage,
and alive2 is happy with those transforms.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91038
When there are constants that have the same structural location, but not
the same value, between different regions, we cannot simply outline the
region. Instead, we find the constants that are not the same in each
location, and promote them to arguments to be passed into the respective
functions. At each call site, we pass the constant in as an argument
regardless of type.
Added/Edited Tests:
llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll
llvm/test/Transforms/IROutliner/outlining-different-constants.ll
llvm/test/Transforms/IROutliner/outlining-different-globals.ll
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D87294
Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
of other paths.
New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.
One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.
In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.
Reviewed By: yrouban
Differential Revision: https://reviews.llvm.org/D79485
I think this is NFC currently, but the bug would be exposed
when we allow binary intrinsics (maxnum, etc) as candidates
for reductions.
The code in matchAssociativeReduction() is using
OperationData::getNumberOfOperands() when comparing whether
the "EdgeToVisit" iterator is in-bounds, so this code must
use the same (potentially offset) operand value to set
the "EdgeToVisit".
The llvm.coro.end.async intrinsic allows to specify a function that is
to be called as the last action before returning. This function will be
inlined after coroutine splitting.
This function can contain a 'musttail' call to allow for guaranteed tail
calling as the last action.
Differential Revision: https://reviews.llvm.org/D93568
ScalarEvolution should be able to handle both constant and variable trip
counts using getURemExpr, so we do not have to handle them separately.
This is a small simplification of a56280094e.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D93677
This patch turns updates VPInstruction to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90565
When the trip-count is provably divisible by the maximal/chosen VF, folding the
loop's tail during vectorization is redundant. This commit extends the existing
test for constant trip-counts to any trip-count known to be divisible by
maximal/selected VF by SCEV.
Differential Revision: https://reviews.llvm.org/D93615
This is a follow-up patch of D87045.
The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated.
`createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D87531
When doing select-to-zext/sext transformations, we should
not handle TrueVal and FalseVal of i1 type otherwise it
would result in zext/sext i1 to i1.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93272
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.
This should allow us to begin resolving PR46896 et al.
Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0
Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks.
Differential Revision: https://reviews.llvm.org/D90625
This patch makes VPRecipeBase a direct subclass of VPDef, moving the
SubclassID to VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90564
This patch turns updates VPInterleaveRecipe to manage the values it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90562
An earlier patch introduced asserts that the InstructionCost is
valid because at that time the ReuseShuffleCost variable was an
unsigned. However, now that the variable is an InstructionCost
instance the asserts can be removed.
See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
See this patch for the introduction of the type:
https://reviews.llvm.org/D91174
This patch removes InstrumentFuncEntry as it is dead.
The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry
to MST, but it doesn't make a local copy as a member variable.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D86978
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.
Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
inputs.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
other special cases.
We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll
We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: paquette, plofti
Differential Revision: https://reviews.llvm.org/D86977
Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.
This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.
rdar://49167240
Patch by: Vedant Kumar(@vsk)
Reviewers: hiraditya,rjf,t.p.northover
Reviewed By: hiraditya,rjf
Differential Revision: https://reviews.llvm.org/D59715
inserted when the original call had a 'returned' argument
The code is testing whether the instruction BBI points to is the call
that is paired up with the retainRV/claimRV call, but it doesn't work
when the call has a 'returned' argument since GetArgRCIdentityRoot looks
through 'returned' arguments.
rdar://72485383
MSSA DSE starts at a killing store, finds an earlier store and
then checks that the earlier store is not read along any paths
(without being killed first). However, it uses the memory location
of the killing store for that, not the earlier store that we're
attempting to eliminate.
This has a number of problems:
* Mismatches between what BasicAA considers aliasing and what DSE
considers an overwrite (even though both are correct in isolation)
can result in miscompiles. This is PR48279, which D92045 tries to
fix in a different way. The problem is that we're using a location
from a store that is potentially not executed and thus may be UB,
in which case analysis results can be arbitrary.
* Metadata on the killing store may be used to determine aliasing,
but there is no guarantee that the metadata is valid, as the specific
killing store may not be executed. Using the metadata on the earlier
store is valid (it is the store we're removing, so on any execution
where its removal may be observed, it must be executed).
* The location is imprecise. For full overwrites the killing store
will always have a location that is larger or equal than the earlier
access location, so it's beneficial to use the earlier access
location. This is not the case for partial overwrites, in which
case either location might be smaller. There is some room for
improvement here.
Using the earlier access location means that we can no longer cache
which accesses are read for a given killing store, as we may be
querying different locations. However, it turns out that simply
dropping the cache has no notable impact on compile-time.
Differential Revision: https://reviews.llvm.org/D93523
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.
This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.
Differential Revision: https://reviews.llvm.org/D87188
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.
This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet:
```
while (!PN.use_empty()) {
LoadInst *LI = cast<LoadInst>(PN.user_back());
LI->replaceAllUsesWith(NewPN);
LI->eraseFromParent();
}
```
If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:
```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
LoadTy, InVal, Alignment,
(PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```
This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D92431
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).
We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.
Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh
The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).
Differential Revision: https://reviews.llvm.org/D93229
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined. These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.
Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.
Reviewers: jroelofs, paquette
Differential Revisions: https://reviews.llvm.org/D86976
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D93439
Extracting the similar regions is the first step in the IROutliner.
Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed. Each
IRSimilarityCandidate is used to define an OutlinableRegion. Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.
Each region is then extracted with the CodeExtractor into its own
function.
We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Recommit of bf899e8913 fixing memory
leaks.
Reviewers: paquette, jroelofs, yroux
Differential Revision: https://reviews.llvm.org/D86975
The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.
This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D92576
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.
The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
This patch is based on an earlier patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, lebedev.ri
Differential Revision: https://reviews.llvm.org/D91444
When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.
Reviewed By: jdoerfert, lebedev.ri
Differential Revision: https://reviews.llvm.org/D93410
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.
Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.
The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.
The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93400
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).
There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:
// 1) Missing const
namespace missing_const {
struct A {
#ifndef FIXED
bool operator==(A const&);
#else
bool operator==(A const&) const;
#endif
};
bool a = A{} == A{}; // error
}
// 2) Type mismatch on CRTP
namespace crtp_mismatch {
template <typename Derived>
struct Base {
#ifndef FIXED
bool operator==(Derived const&) const;
#else
// in one case changed to taking Base const&
friend bool operator==(Derived const&, Derived const&);
#endif
};
struct D : Base<D> { };
bool b = D{} == D{}; // error
}
// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
template <bool Const>
struct iterator {
using const_iterator = iterator<true>;
iterator();
template <bool B, std::enable_if_t<(Const && !B), int> = 0>
iterator(iterator<B> const&);
#ifndef FIXED
bool operator==(const_iterator const&) const;
#else
friend bool operator==(iterator const&, iterator const&);
#endif
};
bool c = iterator<false>{} == iterator<false>{} // error
|| iterator<false>{} == iterator<true>{}
|| iterator<true>{} == iterator<false>{}
|| iterator<true>{} == iterator<true>{};
}
// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
enum Color { Red };
struct C {
C();
C(Color);
operator Color() const;
bool operator==(Color) const;
friend bool operator==(C, C);
};
bool c = C{} == C{}; // error
bool d = C{} == Red;
}
Differential revision: https://reviews.llvm.org/D78938
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.
This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D93399
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
**Adjustment to profile format **
A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
Differential Revision: https://reviews.llvm.org/D92347
Details: Jump Threading does not make sense for the targets with divergent CF
since they do not use branch prediction for speculative execution.
Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
This may cause errors in further CF lowering.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93302
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.
There wasn't a great test coverage for this, so i added extra, to be sure.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed. This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.
Differential Revision: https://reviews.llvm.org/D61540
Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)
We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.
When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.
Differential Revision: https://reviews.llvm.org/D93406
As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.
There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.
Differential Revision: https://reviews.llvm.org/D93397
Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.
This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.
Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:
PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);
Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045
This patch changes the type of cost variables (for instance: Cost, ExtractCost,
SpillCost) to use InstructionCost.
This patch also changes the type of cost variables to InstructionCost in other
functions that use the result of getTreeCost()
This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.
See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Depends on D91174
Differential Revision: https://reviews.llvm.org/D93049
Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch.
I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us
to ignore calls that would otherwise prevent ICV tracking.
Once we track more ICVs we might need to distinguish the ones that could
be impacted even with `no_openmp_routines`.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D92050
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.
ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings.
We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.
The AnnotationRemarks pass is already run at the end of the module
pipeline. This patch also adds it before bailing out for -O0, so remarks
are also generated with -O0.
This patch turns updates VPWidenSelectRecipe to manage the value
it defines using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90560
This patch turns updates VPWidenGEPRecipe to manage the value it defines
using VPDef. The VPValue is used during VPlan construction and
codegeneration instead of the plain IR reference where possible.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90561
This patch turns updates VPWidenREcipe to manage the value it defines
using VPDef.
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D90559