Updated LowerGuardIntrinsic and LowerWidenableCondition to check for
users of the respective intrinsic, instead of checking for guards and
widenable conditions by traversing the entire function.
This is an NFC. Should save some compile time.
This reverts the revert commit 1ddc719680.
This version of the patch sets the initial available value to poison,
which resolves an issue with the SSAUpdater breaking LCSSA form.
IMO when user provide unroll pragma, compiler should always respect it.
It is not clear to me why loop unroll pass currently ensure that the
unrolled loop size is limited by PragmaUnrollThreshold.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D119148
When only a store is sunk, there is no need to create a load in the
pre-header, as the result of the load will never get used.
The dead load can can introduce UB, if the function is marked as
writeonly.
Fixes#51248.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123473
And thread DSE's ephemeral values to EarliestEscapeInfo.
This allows more precise analysis in DSEState::isReadClobber() via BatchAA.
Followup to D123162.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123342
Loop Strength Reduce sometimes optimizes away all uses of an induction variable
from a loop but leaves the IV increments. When the only remaining use of the IV
is the PHI in the exit block, this patch will call rewriteLoopExitValues to
replace the exit block PHI with the final value of the IV to skip the updates
in each loop iteration.
Differential Revision: https://reviews.llvm.org/D118808
This makes MemorySSA in LoopSink required, and removes the AST-based
implementation, as well as the related support code in LICM.
Differential Revision: https://reviews.llvm.org/D123288
Similar to the problem in 0bb25b4603, bitcasts that are inserted must
dominate all uses. When rewriting "values" with "new values" that have
the updated address space, we may replace the "new value" with a bitcast
if one of the original users is an addresspace cast. This bitcast must
be inserted before ALL users, not only before the addresspace cast.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122964
LoopSink with the legacy pass manager still uses AST, because we
can't compute MemorySSA conditionally. I think now that the legacy
pass manager will be removed soon(TM) we don't need to care about
compile-time impact here anymore. Additionally, since MemorySSA is
no longer eagerly optimized, the impact is actually not that high
anymore (~0.2% geomean regression on CTMark).
This just makes legacy PM and new PM behavior line up -- as a
followup I'll drop these options entirely and make MemorySSA use
mandatory.
Differential Revision: https://reviews.llvm.org/D123216
Motivated by pr43326 (https://bugs.llvm.org/show_bug.cgi?id=43326), where a slightly
modified case is as follows.
void f(int e[10][10][10], int f[10][10][10]) {
for (int a = 0; a < 10; a++)
for (int b = 0; b < 10; b++)
for (int c = 0; c < 10; c++)
f[c][b][a] = e[c][b][a];
}
The ideal optimal access pattern after running interchange is supposed to be the following
void f(int e[10][10][10], int f[10][10][10]) {
for (int c = 0; c < 10; c++)
for (int b = 0; b < 10; b++)
for (int a = 0; a < 10; a++)
f[c][b][a] = e[c][b][a];
}
Currently loop interchange is limited to picking up the innermost loop and finding an order
that is locally optimal for it. However, the pass failed to produce the globally optimal
loop access order. For more complex examples what we get could be quite far from the
globally optimal ordering.
What is proposed in this patch is to do a "bubble-sort" fashion when doing interchange.
By comparing neighbors in `LoopList` in each iteration, we would be able to move each loop
onto a most appropriate place, hence this is an approach that tries to achieve the
globally optimal ordering.
The motivating example above is added as a test case.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D120386
Add void casts to mark the variables used, next to the places where
they are used in assert or `LLVM_DEBUG()` expressions.
Differential Revision: https://reviews.llvm.org/D123117
Partially inlining a libcall that has the musttail attribute
leads to broken LLVM IR, triggering an assertion in the IR verifier.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D123116
As discussed on https://github.com/llvm/llvm-project/issues/54682,
MemorySSA currently has a bug when computing the clobber of calls
that access loop-varying locations. I think a "proper" fix for this
on the MemorySSA side might be non-trivial, but we can easily work
around this in MemCpyOpt:
Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess()
call to find a clobber on either the src or dest location, and then
refines it for the src and dest clobber. This was intended as an
optimization, as the location-less API is cached, while the
location-affected APIs are not.
However, I don't think this really makes a difference in practice,
because I don't think anything will use the cached clobbers on
those calls later anyway. On CTMark, this patch seems to be very
mildly positive actually.
So I think this is a reasonable way to avoid the problem for now,
though MemorySSA should also get a fix.
Differential Revision: https://reviews.llvm.org/D122911
The range calculation in walkForwards() assumes that the ranges of
the operands have already been calculated. With the used visit
order, this is not necessarily the case when there are multiple
roots. (There is nothing guaranteeing that instructions are visited
in topological order.)
Fix this by queuing instructions for reprocessing if the operand
ranges haven't been calculated yet.
Fixes https://github.com/llvm/llvm-project/issues/54669.
Differential Revision: https://reviews.llvm.org/D122817
The search for the clobbering call is fairly expensive if uses are not optimized at construction. Defer the clobber walk to the point in the implementation we need it; there are a bunch of bailouts before that point. (e.g. If the source pointer is not an alloca, we can't do callslotopt.)
On a test case which involves a bunch of copies from argument pointers, this switches memcpyopt from > 1/2 second to < 10ms.
This refactor makes it easier to extend the logic to collect information
from blocks in the future, without even further increasing the size of
eliminateConstriants.
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
Instead of first creating a lambda for calculating the range,
then collecting the ranges for the operands, and then calling the
lambda on those ranges, we can first calculate the operand ranges
and then calculate the result directly in the switch.
According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html,
"Hoisting identical instructions which appear in several successor
blocks into a predecessor block. In this case there is no single
merged instruction. The rule for dropping locations applies".
Thanks to Yuanbo Li for reporting this.
Reviewed By: dblaikie
Reviewers: sebpop, tejohnson, dblaikie
Differential Revision: https://reviews.llvm.org/D122730
Factor in the TBAA of adjacent stores instead of just the head store
when merging stores into a memset. We were seeing GVN remove a load that
had a TBAA that matched the 2nd store because GVN determined it didn't
match the TBAA of the memset. The memset had the TBAA of only the first
store.
i.e. Loading the field pi_ of shared_count after memset to create an
array of shared_ptr
template<class T>
class shared_ptr {
T *p;
shared_count refcount;
};
class shared_count {
sp_counted_base *pi_;
};
Differential Revision: https://reviews.llvm.org/D122205
Avoids merge errors when opaque pointers are loaded into different types.
Reviewed by: jcranmer-intel, hiraditya
Differential Revision: https://reviews.llvm.org/D122521
According to definition of canonical form, it is a canonical
if scale reg does not contain addrec for loop L then none of bases
should contain addrec for this loop.
The critical word here is "contains".
Current checker of canonical form checks not "containing" property
but "is". So it does not check whether it contains but whether it is.
Fix the checker and canonicalizing utility to follow definition.
Without this fix in the test attached the base formula looking as
reg((-1 * {0,+,8}<nuw><nsw><%bb2>)<nsw>) + 1*reg((8 * (%arg /u 8))<nuw>)
is considered as conanocial while base contains an addrec.
And modified formula we want to insert
reg({0,+,8}<nuw><nsw><%bb2>) + 1*reg((-8 * (%arg /u 8)))
is considered as not canonical.
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D122457
We have the same code repeated in both callers, sink it into callee.
The motivation here isn't just code style, we can also defer the relatively expensive aliasing checks until the cheap structural preconditions have been validated. (e.g. Don't bother aliasing if src is not an alloca.) This helps compile time significantly.
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
EarlyCSE currently optimizes all MemoryUses upfront. However,
EarlyCSE only actually queries the clobbering memory access for
a subset of uses, namely those where a CSE candidate has already
been identified. Delaying use optimization to the clobber query
improves compile-time in practice.
This change is not NFC because EarlyCSE has a limit on the number
of clobber queries (EarlyCSEMssaOptCap), in which case it falls
back to the defining access. The defining access for uses will now
no longer coincide with the optimized access.
If there are performance regressions from this change, we should
be able to address them by raising this limit.
Differential Revision: https://reviews.llvm.org/D121987
Rather than iterating over users and comparing operands, iterate
over uses and check operand number. Otherwise, we'll end up
promoting a store twice if it has two equal operands.
This can only happen with opaque pointers, as otherwise both
operands differ by a level of indirection, so a bitcast would have
to be involved.
Fixes https://github.com/llvm/llvm-project/issues/54495.
There is a bunch of code improvements in the patch: marking as const everything what can be
const and fixing some typos in comments.
Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces
method, the TTI parameter is not required because the same field is in the class.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D121671
[SCCP] do not clean up dead blocks that have their address taken
Fixes a crash observed in IPSCCP.
Because the SCCPSolver has already internalized BlockAddresses as
Constants or ConstantExprs, we don't want to try to update their Values
in the ValueLatticeElement. Instead, continue to propagate these
BlockAddress Constants, continue converting BasicBlocks to unreachable,
but don't delete the "dead" BasicBlocks which happen to have their
address taken. Leave replacing the BlockAddresses to another pass.
Fixes: https://github.com/llvm/llvm-project/issues/54238
Fixes: https://github.com/llvm/llvm-project/issues/54251
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D121744
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.
This allows reproducing #54023 using opt.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D121944
The way the pass is actually used in the optimization pipeline,
TLI will be available, but this is not the case when running just
-lower-constant-intrinsics in tests, which ends up being quite
confusing.
Require TLI unconditionally, as we usually do.
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.
This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.
Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).
Differential Revision: https://reviews.llvm.org/D121381
LoopSimplifyCFG may process loops that are not in
loop-simplify/canonical form. For loops not in canonical form, exit
blocks may be reachable from non-loop blocks and we cannot consider them
as dead if they only are not reachable from the loop itself.
Unfortunately the smallest test I could come up with requires running
multiple passes:
-passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)'
The reason is that loops are canonicalized at the beginning of loop
pipelines, so a later transform has to break canonical form in a way
that breaks LoopSimplifyCFG's dead-exit analysis.
Alternatively we could try to require all loop passes to maintain
canonical form. That in turn would also require additional verification.
Fixes#54023, #49931.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D121925
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Differential Revision: https://reviews.llvm.org/D115907
When a load extends past the extent of the alloca, SROA will
restrict the slice size to extend to the end of the alloca only.
However, presplitting was asserting that the load size and the
slice size match exactly, which does not hold in this case.
Relax the assertion to only require that the load size is greater
or equal than the slice size.
Context: I needed this for https://github.com/google/iree/pull/8474 .
I found that TSan instrumentation expects vector sizes to be <= 16,
and in my project (IREE) we have tests with higher vector sizes.
That left some test functions uninstrumented, resulting in crashes as
instrumented code called into them.
Differential Revision: https://reviews.llvm.org/D121182
This patch is motivated by pr48057 where an output dependency is not detected
since loop interchange did not check a store instruction with itself.
Fixed that deficiency.
Reviewed By: bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D118102
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
After moving the CanAdd check in c60cdb44f7 and using it for
the assume cases as well, the passed in block may not have a branch
instruction as terminator. This can trigger the assertion. Given the new
use case, it doesn't add value any longer and can be removed.
Fixes https://github.com/llvm/llvm-project/issues/54281
When decomposing constraints for unsigned conditions, we can use
negative values by zero-extending them, as long as they are less than
the maximum constraint value.
Fixes https://github.com/llvm/llvm-project/issues/54224
Add missing CanAdd check before adding a condition from an assume
to the successor blocks. When adding information from assume to
successor blocks we need to perform the same CanAdd as we do for adding
a condition from a branch.
Fixes https://github.com/llvm/llvm-project/issues/54217
This patch extends ConstraintElimination to also remove dead variables
when removing a constraint. When a constraint is removed because it is
out of scope, all new variables added for this constraint can also be
removed.
This keeps the total size of the systems much smaller, because it
reduces the number of variables drastically.
It also fixes a bug where variables where removed incorrectly.
Fixes https://github.com/llvm/llvm-project/issues/54228
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
Skip phi nodes in the preheader. They may not be considered loop
invariant by the assertion below.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D121010
There seems to be one more uncaught problem, SROA may now end up trying
to re-re-repromote the just-promoted shadow alloca, and do that endlessly.
This reverts commit adc0984d81.
This is inspired by the original variant of D109749 by Graham Hunter,
but is a more general version.
Roughly, instead of promoting the alloca, we call it
a shadow/backing alloca, go through all it's slices,
clone(!) instructions that operated on it,
but make them operate on the cloned alloca,
and promote cloned alloca instead.
This keeps the shadow/backing alloca, and all the original instructions
around, which results in said shadow/backing alloca being
a perfect mirror/representation of the promoted alloca's content,
so calls that take the alloca as arguments (non-capturingly!)
can be supported.
For now, we require that the calls also don't modify the alloca's content,
but that is only to simplify the initial implementation,
and that will be supported in a follow-up.
Overall, this leads to *smaller* codesize:
https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total
and is roughly neutral compile-time wise:
https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions
This relands commit 703240c71f,
that was reverted by commit 7405581f7c,
because the assertion `isa<LoadInst>(OrigInstr)` didn't hold in practice,
as the newly added test `@select_of_ptrs` shows:
If the pointers into alloca are used by select's/PHI's, then even if
we manage to fracture the alloca, some sub-alloca's will likely remain.
And if there are any non-capturing calls, then we will also decide to
keep the original backing alloca around, and we suddenly ~doubled
the alloca size, and the amount of memory traffic.
I'm not sure if this is a problem or we could live with it,
but let's leave that for later...
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D113520
Recommit without changes over 53abe3ff66,
which addressed the cause of the reported crash.
-----
With opaque pointers, the zero-offset load will generally not use
a GEP. Allow a direct load without GEP, which is treated the same
way as a zero-offset GEP.
Use the overload that support moving into an empty block. I don't
think that this situation can occur right now, but it can happen
with the change from e7fb1c15cb,
and the test is derived from the issue reported there.
This builds on @fhahn's D112313, and caches the liveOnEntry node as a optimized access. D112313 tied to only cache a known clobber. This change adds caching the fact that no clobber exists. It still does not cache may-clobber results.
Differential Revision: https://reviews.llvm.org/D120842
There is a general WalkerStepLimit adjustment higher up in the
loop, and I don't see any reason why this particular case would
need additional adjustment. Furthermore, this could underflow.
With opaque pointers, the zero-offset load will generally not use
a GEP. Allow a direct load without GEP, which is treated the same
way as a zero-offset GEP.
The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example:
```
loop:
%phi = phi i32* [ %sel, %loop ], [ %x, %entry ]
%f = tail call i32* @f(i32* %phi)
%cmp1 = icmp ne i32* %f, %y
%sel = select i1 %cmp1, i32* %f, i32* null
%cmp2 = icmp eq i32* %sel, null
br i1 %cmp2, label %return, label %loop
```
But the select condition can be inverted:
```
%cmp1 = icmp eq i32* %f, %y
%sel = select i1 %cmp1, i32* null, i32* %f
```
The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D119643
D118623 added code to fold not-of-compare into a compare
with the inverted predicate, if the compare had no other
uses. This relies on accurate use lists in the IR but it
was run before setPhiValues, when some phi inputs are still
stored in a data structure on the side, instead of being
real uses in the IR. The effect was that a phi that should
be using the original compare result would now get an
inverted result instead.
Fix this by moving simplifyConditions after setPhiValues.
Differential Revision: https://reviews.llvm.org/D120312
Currently writtenBetween can miss clobbers of Loc between End and Start,
if End is a MemoryUse.
To guarantee we see all write clobbers of Loc between Start and End
for MemoryUses, restrict to Start and End being in the same block
and check all accesses between them.
This fixes 2 mis-compiles illustrated in
llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D119929
This patch simplifies constraint handling by removing the
ConstraintListTy wrapper struct and moving the Preconditions directly
into ConstraintTy. This reduces the amount of memory needed for managing
constraints.
The only use case for ConstraintListTy was adding 2 constraints to model
ICMP_EQ conditions. But this can be handled by adding an IsEq flag. When
adding an equality constraint, we need to add the constraint and the
inverted constraint.
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.
This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D119965
This will bail out on target specific intrinsics. If those are deemed
important enough for EarlyCSE to handle, we can augment MemIntrinsicInfo
with an access type for TargetTransformInfo::getTgtMemIntrinsic() to
handle.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D120077
The assertion verifying that a newly computed value matches what is
already cached used stripPointerCasts() to strip bitcasts, however the
values can be not only pointers, but also vectors of pointers. That is
problematic because stripPointerCasts() doesn't handle vectors of
pointers. This patch introduces an ad-hoc utility function to strip all
bitcasts regardless of the value type.
Reviewed By: skatkov, reames
Differential Revision: https://reviews.llvm.org/D119994
This introduces a new "ptrauth" operand bundle to be used in
call/invoke. At the IR level, it's semantically equivalent to an
@llvm.ptrauth.auth followed by an indirect call, but it additionally
provides additional hardening, by preventing the intermediate raw
pointer from being exposed.
This mostly adds the IR definition, verifier checks, and support in
a couple of general helper functions. Clang IRGen and backend support
will come separately.
Note that we'll eventually want to support this bundle in indirectbr as
well, for similar reasons. indirectbr currently doesn't support bundles
at all, and the IR data structures need to be updated to allow that.
Differential Revision: https://reviews.llvm.org/D113685
If a cast is needed when replacing uses with newly created values, the
cast must be inserted after the instruction that defines the new value.
Fixes: SWDEV-321215
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D119524
If we had some source value we could infer an address space from that
went through a ptrtoint/inttoptr pair, this would fail since bitcast
can't change the address space.
Fixes issue 53665.
To avoid incorrectly merging GEPs with different source types
under opaque pointers.
To avoid increasing the Expression structure size, this reuses the
existing type member. The code does not rely on this to be the
expression result type, it's only used as a disambiguator.
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
With typed pointers the pointer operand type checks the address space
and the load/store type. With opaque pointers we have to check the
load/store type separately.
Enabled loop interchange support for floating point reductions
if it is allowed to reorder floating point operations.
Previously when we encouter a floating point PHI node in the
outer loop exit block, we bailed out since we could not detect
floating point reductions in the early days. Now we remove this
limiation since we are able to detect floating point reductions.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D117450
This patch adds initial support for signed conditions. To do so,
ConstraintElimination maintains two separate systems, one with facts
from signed and one for unsigned conditions.
To start with this means information from signed and unsigned conditions
is kept completely separate. When it is safe to do so, information from
signed conditions may be also transferred to the unsigned system and
vice versa. That's left for follow-ups.
In the initial version, de-composition of signed values just handles
constants and otherwise just uses the value, without trying to
decompose the operation. Again this can be extended in follow-up
changes.
The main benefit of this limited signed support is proving >=s 0
pre-conditions added in D118799. But even this initial version also
fixes PR53273.
Depends on D118799.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D118806
With this patch pre-conditions can be added to a list of constraints.
Constraints with pre-conditions can only be used if all pre-conditions
are satisfied when the constraint is used.
The pre-conditions at the moment are specified as a list of
(Predicate, Value *,Value *) tuples. This allow easily checking them
like any other condition, using the existing infrastructure.
This then is used to limit GEP decomposition to cases where we can
prove that offsets are signed positive.
This fixes a couple of incorrect transforms where GEP offsets where
assumed to be signed positive, but they were not.
Note that this effectively disables GEP decomposition, as there's no
support for reasoning about signed predicates. D118806 adds initial
signed support.
Fixes PR49624.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D118799
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
This makes the statepoint methods in IRBuilder accept a
FunctionCallee, which carries both the callee and function type.
This is used to add the elementtype attribute to the statepoint call.
RS4GC requires an additional tweak to actually preserve that attribute
-- previously the attributes on the call were completely overwritten.
Differential Revision: https://reviews.llvm.org/D118886
Finding re-materialization chain for derived pointer does not depend on
call site. To avoid this finding for each call site it can be extracted in
a separate routine.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D118676
Assertion added in f50821cff0 confirms that the DT is indeed nonnull.
Change it to a reference instead of a pointer to make this explicit in
FusionCandidate.
Suggested in D118472.
After adding another value kind in 8a12cae862, Value * pointers do not
have enough available empty bits to store the kind (e.g. on ARM)
To address this, the patch replaces the PointerIntPair with separate
value and kind fields.
This patch extends the available-value logic to detect loads
of pointer-selects that can be replaced by a value select.
For example, consider the code below:
loop:
%sel.phi = phi i32* [ %start, %ph ], [ %sel, %ph ]
%l = load %ptr
%l.sel = load %sel.phi
%sel = select cond, %ptr, %sel.phi
...
exit:
%res = load %sel
use(%res)
The load of the pointer phi can be replaced by a load of the start value
outside the loop and a new phi/select chain based on the loaded values,
as illustrated below
%l.start = load %start
loop:
sel.phi.prom = phi i32 [ %l.start, %ph ], [ %sel.prom, %ph ]
%l = load %ptr
%sel.prom = select cond, %l, %sel.phi.prom
...
exit:
use(%sel.prom)
This is a first step towards alllowing vectorizing loops using common libc++
library functions, like std::min_element (https://clang.godbolt.org/z/6czGzzqbs)
#include <vector>
#include <algorithm>
int foo(const std::vector<int> &V) {
return *std::min_element(V.begin(), V.end());
}
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D118143
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652
The code paths analyzed (all constructor invocations of fusion
candidate) pass in a non-null DT.
Adding this assert as requested in D118472 before converting this to a
reference argument.
Cleanup code in peelLoop API. We already have usage of DT without guarding
against a null DT, so this change constant folds the remaining null DT
checks.
Also make the argument a reference so that it is clear the argument is
a nonnull DT.
Extracted from D118472.
We tracked down some non-determinism in compilation output to the
DFAJumpThreading pass. These changes fixed our issue:
* Make the DefMap type a MapVector to make its iteration order depend on
insertion order.
* Sort the values to be inserted into NewDefs by instruction order to
make the insertion order deterministic. Since these values come from
iterating over a ValueMap, which doesn't have deterministic iteration
order, I couldn't fix this at its source.
Reviewed By: alexey.zhikhar
Differential Revision: https://reviews.llvm.org/D118590
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
PointerToBase is a mapping between potentially derived pointer to its base.
As soon as we are in SSA form if there is a base of derived pointer and it
is available at def of derived pointer, the same base will be available at any
point where derived pointer is alive.
So the mapping of derived pointer to base pointer is not a property
of a call site but the same on function level.
Reviewers: reames, yrouban
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D118604
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D108221
phi([undef, A], [x, B]) -> x is only correct x is guaranteed to be
a non-poison value.
Otherwise we would be changing an undef to poison in the branch A.
Differential Revision: https://reviews.llvm.org/D117907
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.
This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.
Reviewed By: anemet, thegameg
Differential Revision: https://reviews.llvm.org/D118239
This patch adds a struct to manage a list of constraints. It simplifies
a follow-up change, that adds pre-conditions that must hold before a
list of constraints can be used.
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
This should be no functional change, as the cases supported by the
helper and the cases supported by DSE are currently the same, the
code structure is just slightly different.
The phi-of-ops functionality has a function OpIsSafeForPHIOfOps
to determine when it's safe to create the new phi.
But this function only checks for the obvious dominator conditions
and ignores memory.
This patch takes the conservative approach and disables phi-of-ops
whenever there's a load that doesn't dominate the phi, as its
value may be affected by a store inside the loop.
This can be improved later to check aliasing between the
load/stores.
Fixes https://llvm.org/PR53277
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D117999
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.
The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.
Differential Revision: https://reviews.llvm.org/D117000
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
Together with the previous commit which mainly documents better LoopFlatten's
overall strategy, this addresses a concern added as a FIXME comment in D110587;
the code refactoring (NFC) introduces functions (also for the SCEV usage) to
make this clearer.
This method is intended for use in places that cannot be reached
with opaque pointers, or part of deprecated methods. This makes
it easier to see that some uses of getPointerElementType() don't
need further action.
Differential Revision: https://reviews.llvm.org/D117870
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.
-----
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.
-----
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.
In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.
Differential Revision: https://reviews.llvm.org/D115615
Call slot optimization currently merges the metadata between the
call and the load. However, we also need to merge in the metadata
of the store.
Part of the reason why we might have gotten away with this
previously is that usually the load and the store are the same
instruction (a memcpy), this can only happen if call slot
optimization occurs on an actual load/store pair.
This addresses the issue reported in
https://reviews.llvm.org/D115615#3251386.
Differential Revision: https://reviews.llvm.org/D117679
Change the DSE calloc handling to assume that it is
inaccessiblememonly, i.e. the defining access is liveOnEntry.
Differential Revision: https://reviews.llvm.org/D117543
I would like to move LoopFlatten from LoopPass Manager LPM2 to LPM1 (D116612),
but that is a LPM that is using MemorySSA and so LoopFlatten needs to preserve
MemorySSA and this adds that. More specifically, LoopFlatten restructures the
CFG and with this change the MSSA state is updated accordingly, where we also
update the DomTree. LoopFlatten doesn't rewrite/optimise/delete load or store
instructions, so I have not added any MSSA updates for that.
Differential Revision: https://reviews.llvm.org/D116660
ConstantHoist currently only hoists GEPs if there is no notional
overindexing. As this transform only hoists address arithmetic,
it shouldn't care about whether any overindexing occurs or not.
There is one caveat: If the hoisted base GEP is inbounds, and a
later non-inbounds GEP is rewritten in terms of it, the value
may be incorrectly poisoned. To avoid this, restrict the transform
to inbounds GEPs for now, as the notional overindexing check
effectively did that as well. The inbounds restriction could be
dropped by dropping inbounds from the base GEP expression.
Differential Revision: https://reviews.llvm.org/D117201
This casued a miscompile due to call slot optimization replacing a call
argument without considering the call's !noalias metadata, see discussion on
the code review.
> Call slot optimization is currently supposed to be prevented if
> the call can capture the source pointer. Due to an implementation
> bug, this check currently doesn't trigger if a bitcast of the source
> pointer is passed instead. I'm somewhat afraid of the fallout of
> fixing this bug (due to heavy reliance on call slot optimization
> in rust), so I'd like to strengthen the capture reasoning a bit first.
>
> In particular, I believe that the capture is fine as long as a)
> the call itself cannot depend on the pointer identity, because
> neither dest has been captured before/at nor src before the
> call and b) there is no potential use of the captured pointer
> before the lifetime of the source alloca ends, either due to
> lifetime.end or a return from a function. At that point the
> potentially captured pointer becomes dangling.
>
> Differential Revision: https://reviews.llvm.org/D115615
Also reverting the dependent commit:
> [MemCpyOpt] Look through pointer casts when checking capture
>
> The user scanning loop above looks through pointer casts, so we
> also need to strip pointer casts in the capture check. Previously
> the source was incorrectly considered not captured if a bitcast
> was passed to the call.
This reverts commit 487a34ed9d
and 00e6869463.
canSkipDef() currently skips inaccessiblememonly calls, but not
if they are allocation functions. This check was added in D103009,
but actually seems to be a leftover from a previous implementation
in D101440. canSkipDef() is not used on the storeIsNoop() path,
where the relevant transform ended up being implemented.
Differential Revision: https://reviews.llvm.org/D117005
This fixes a crash I observed in issue #48708 where the LSR
pass tries to insert an instruction in a basic block with only a
catchswitch statement in there. This happens because the Phi node
being evaluated assumes the same value for different basic blocks.
If the basic block associated with the incoming value of the operand
being evaluated has an EHPad terminator LSR skips optimizing it.
But if that incoming value can come from multiple different blocks
there can be some incoming basic blocks which are terminated in
an EHPad. If these are then rewritten in RewriteForPhi the ones
containing an EHPad terminator will hit the "Insertion point must
be a normal instruction" assert in AdjustInsertPositionForExpand.
This fix makes CollectLoopInvariantFixupsAndFormulae also ignore
cases where the same value has another incoming basic block with an
EHPad, same as it already does in case the primary value has one.
Patch by Lorenz Brun <lorenz@brun.one>
Differential Revision: https://reviews.llvm.org/D98378
In the process of rewriting `alloca`s and `phi`s that use them, the SROA
pass can try to insert a non-PHI instruction by calling
`getFirstInsertionPt()`, which is not possible in a catchswitch BB. This
CL makes we bail out on these cases.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D117168
Currently loop interchange only supports loops with one inner loop
induction variable. This patch adds support for transformation with
more than one inner loop induction variables. The induction PHIs and
induction increment instructions are moved/duplicated properly to the
new outer header and the new outer latch, respectively.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D114917
After e734e8286b, it is possible to end up in
a situation where an `indirectbr` is fed by a cast, which is in turn fed by
an operation which only produces integers.
`indirectbr` expects a block address, however these operations can't produce
that.
There were several asserts in `computeValueKnownInPredecessorsImpl` which check
that we're not looking for a block address if we're walking through something
which can never produce one.
Since it's now possible to hit these asserts, this changes them into actual
checks which return false if `Preference` is not `WantInteger`.
This adds a testcase which verifies that we don't crash anymore in these
situations.
Differential Revision: https://reviews.llvm.org/D99814
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D117038
This patch enables loop interchange with multiple outer loop
induction variables, and hence removes the limitation that only
a single outer loop induction variable is supported. In fact, it
turns out that the current pass already trivially supports multiple
outer indvars, which is the result of a previous patch
`https://reviews.llvm.org/D102743`. Therefore, this patch removed that
limitation and provides test cases for multiple outer indvars.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D114916
This is required to query the legality more precisely in the LoopVectorizer.
This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.
Differential Revision: https://reviews.llvm.org/D115329
This change removes a direct check for calloc-like allocation functions, and instead handles the generic case where we're storing a constant to constant initialized memory. This is mostly to remove the call to isCallocLike, but if someone downstream happens to have an initialized alloc which initializes to e.g. -1, this will also kick in for them. (I don't know of such an example ftr.)
For these "visible on unwind/ret" checks we only care about the
fact that no other code has access to the pointer (unless it
escapes). A noalias call is sufficient for this, it does not
have to be a known allocation function.
This is basically the same change as D116728, but for DSE rather
than LICM.
getNumberOfRegisters takes a ClassID as it's argument. It shouldn't be passed a bool. Assuming the bool meant vector or not, we should call getRegisterClassForType first.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D116903
SROA has 3 data-structures where it stores sets of instructions that should
be deleted:
- DeadUsers -> instructions that are UB or have no users
- DeadOperands -> instructions that are UB or operands of useless phis
- DeadInsts -> "dead" instructions, including loads of uninitialized memory
with users
The first 2 sets can be RAUW with poison instead of undef. No brainer as UB
can be replaced with poison, and for instructions with no users RAUW is a
NOP.
The 3rd case cannot be currently replaced with poison because the set mixes
the loads of uninit memory. I leave that alone for now.
Another case where we can use poison is in the construction of vectors from
multiple loads. The base vector for the first insertelement is now poison as
it doesn't matter as it is fully overwritten by inserts.
Differential Revision: https://reviews.llvm.org/D116887
There is no need to sort inserted instructions by dominance, as the
deletion loop still requires RAUW with undef before deleting. Removing
instructions in reverse insertion order should still insure that the
number of uselist updates is kept to a minimum.
This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.
The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
There was a limitation in legality that in the original inner loop latch,
no instruction was allowed between the induction variable increment
and the branch instruction. This is because we used to split the
inner latch at the induction variable increment instruction. Since
now we have split at the inner latch branch instruction and have
properly duplicated instructions over to the split block, we remove
this limitation.
Please refer to the test case updates to see how we now interchange
loops where instructions exist between the induction variable
increment and the branch instruction.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D115238
dyn_cast<> can return null - use cast<> instead to assert the cast is valid before dereferencing the casted pointer.
Fixes static-analyzer null dereference warning.
Fix static analysis warning by using cast<> instead of dyn_cast<> as both isa<> and isGuaranteedToExecuteForEveryIteration expect a non-null Instruction pointer.
When determining whether the memory is local to the function (and
we can thus introduce spurious writes without thread-safety issues),
check for a noalias call rather than the hardcoded list of memory
allocation functions. Noalias calls are the more general way to
determine allocation functions, as long as we're only interested
in the property that the returned value is distinct from any other
accessible memory.
Differential Revision: https://reviews.llvm.org/D116728
There was a limitation in legality that in the original inner loop latch,
no instruction was allowed between the induction variable increment
and the branch instruction. This is because we used to split the
inner latch at the induction variable increment instruction. Since
now we have split at the inner latch branch instruction and have
properly duplicated instructions over to the split block, we remove
this limitation.
Please refer to the test case updates to see how we now interchange
loops where instructions exist between the induction variable increment
and the branch instruction.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D115238
This patch delayed the updates of the dominator tree to the very end of
the pass instead of doing that in small increments after each basic
block.
This improves the runtime of the pass in particular in pathological
cases because now the updater sees the full extend of the updates and
can decide whether it is faster to apply the changes incrementally or
just recompute the full tree from scratch.
Put differently, thanks to this patch, we can take advantage of the
improvements that Chijun Sima <simachijun@gmail.com> made in the
dominator tree updater a while ago with commit 32fd196cbf4d: "Teach the
DominatorTree fallback to recalculation when applying updates to speedup
JT (PR37929)".
This change is NFC but can improve the runtime of the compiler
dramatically in some pathological cases (where the pass was pushing a
lot (several thousands) of small updates (less than 6)).
For instance on the motivating example we went from 300+ sec to less
than a second.
Differential Revision: https://reviews.llvm.org/D116610
The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.
In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.
Differential Revision: https://reviews.llvm.org/D115615
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.
Differential Revision: https://reviews.llvm.org/D116110
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
If the killing store overwrites the whole object, we know that the
preceding store is dead, regardless of the accessed offset or size.
This case was previously only handled if the size of the dead store
was also known.
This allows us to perform conventional DSE for calls that write to
an argument (but without known size).
Differential Revision: https://reviews.llvm.org/D116267
Remove the assertion about the pointer element type, only check
that the stride is one. Ultimately, the actual pointer type here
doesn't matter, because SCEVExpander would insert appropriate
casts if necessary.
Otherwise, it is possible that the state defined in the determinator
block defines the state for the next iteration of the loop, rather than
for the current one.
Fixes llvm-test-suite's
SingleSource/Regression/C/gcc-c-torture/execute/pr80421.c
Differential Revision: https://reviews.llvm.org/D115832
This fixes a typo in 81d69e1bda.
Of course we should only skip the particular store if it isn't
removable, not bail out of the whole loop. Add a test to cover
this case.
At this point the instruction may either have an analyzable
write or be a terminator. For terminators, isRemovable() is not
necessarily well-defined. Move the check until after we have ensured
that it is not a terminator.
The only non-trivial change here is that the isReadClobber()
check for redundant stores is now on the DefLoc, not the
UpperLoc. This is semantically the right location to use, though
in practice it makes no difference (the locations are either the
same, or the def inst does not read).
We have Value->stripInBoundsConstantOffsets() which does what we
want here, but the inbounds requirement isn't actually necessary.
We should probably add Value->stripConstantOffsets() as well.
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.
This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.
There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.
Differential Revision: https://reviews.llvm.org/D116210
Fix handling of alloc-like instructions in isGuaranteedLoopInvariant(). It was not valid when the 'KillingDef' was outside of the loop, while the 'CurrentDef' was inside the loop. In that case, the 'KillingDef' only overwrites the definition from the last iteration of the loop, and not the ones of all iterations. Therefor it does not make the 'CurrentDef' to be dead, and must not remove it.
Fixing issue : https://github.com/llvm/llvm-project/issues/52774
Reviewed by: Florian Hahn
Differential revision: https://reviews.llvm.org/D115965
It is not necessary to explicitly check which attributes are
present, and only add those to the builder. We can simply list
all attributes that need to be stripped and remove them
unconditionally. This also allows us to use some nicer APIs that
don't require mucking about with attribute list indices.
We can only drop calls if they have an analyzable write, the return
value is not used, they don't throw and they don't diverge. The last
two conditions were previously not checked, because all the libcalls
with analyzable writes already happened to satisfy those conditions
anyway. This may not be true for generalizations (with D115904 in mind).
No test changes because the necessary attributes are already inferred
for currently supported libcalls.
Differential Revision: https://reviews.llvm.org/D115962
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
If a loop isn't forced to be unrolled, we want to avoid unrolling it when there
is an explicit unroll-and-jam pragma. This is to prevent automatic unrolling
from interfering with the user requested transformation.
Differential Revision: https://reviews.llvm.org/D114886
Symbolic execution using PredicateInfo is only done for the ssa.copy
intrinsic. It's using two potential sources for building the expression:
1. the Value of the instruction for which the instruction is a copy of, and
2. the Value from the contraint in PredicateInfo
It's possible to get into an infinite loop when choosing between these
two, as described in PR31613.
This patch proposes performing swapping of the two values (i.e. choosing
the second one for the expression), if that same second value was chosen
before; this breaks the cycle.
In the testcases provided, where there is a contradiction between the
value from symbolic execution and assume instruction, NewGVN reduces the
assume to assume(false).
Resolves PR31613.
Differential Revision: https://reviews.llvm.org/D110907
Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.
Reviewed By: qianzhen, bmahjour
Differential Revision: https://reviews.llvm.org/D108112
Poison-generating flags can be retained during CSE on the earlier
instruction , *if* the earlier instruction being poison causes UB. For
now, always take AND for floating point instructions.
https://alive2.llvm.org/ce/z/4K3D7P
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D115247
This reverts change 2c391a5/D87551. As noted in the llvm-dev thread "LICM as canonical form" sent earlier today, introducing this was a major design change made without sufficient cause.
A profile driven LICM is not an unreasonable design, it simply is not what we have. Switching to such a model requires a lot more work than just this patch, and broad aggeement that is the right direction for the optimizer as a whole.
Worth noting is that all the tests included in the reverted changed are probably handled if we allow running unconstrained LICM, and later run LoopSink. As such, we have no public examples which motivate a profit based hoisting approach.
DSE has some extra logic to determine the write location of library
calls like str*cpy and str*cat. This patch moves the logic to a new
MemoryLocation:getForDest variant, which takes a call and TLI.
This patch should be NFC, because no other places take advantage of the
new helper yet.
Suggested by @reames post-commit 7eec832def.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D114872
When doing load/store promotion within LICM, if we
cannot prove that it is safe to sink the store we won't
hoist the load, even though we can prove the load could
be dereferenced and moved outside the loop. This patch
implements the load promotion by moving it in the loop
preheader by inserting proper PHI in the loop. The store
is kept as is in the loop. By doing this, we avoid doing
the load from a memory location in each iteration.
Please consider this small example:
loop {
var = *ptr;
if (var) break;
*ptr= var + 1;
}
After this patch, it will be:
var0 = *ptr;
loop {
var1 = phi (var0, var2);
if (var1) break;
var2 = var1 + 1;
*ptr = var2;
}
This addresses some problems from [0].
[0] https://bugs.llvm.org/show_bug.cgi?id=51193
Differential revision: https://reviews.llvm.org/D113289
5c77aa2b91 [unroll] Use early return in shouldFullUnroll [nfc]
wasn't quite NFC since !(x <= y) is x > y rather than x >= y
Credit to Justin Bogner for spotting the bug
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D114894
When doing load/store promotion within LICM, if we
cannot prove that it is safe to sink the store we won't
hoist the load, even though we can prove the load could
be dereferenced and moved outside the loop. This patch
implements the load promotion by moving it in the loop
preheader by inserting proper PHI in the loop. The store
is kept as is in the loop. By doing this, we avoid doing
the load from a memory location in each iteration.
Please consider this small example:
loop {
var = *ptr;
if (var) break;
*ptr= var + 1;
}
After this patch, it will be:
var0 = *ptr;
loop {
var1 = phi (var0, var2);
if (var1) break;
var2 = var1 + 1;
*ptr = var2;
}
This addresses some problems from [0].
[0] https://bugs.llvm.org/show_bug.cgi?id=51193
Differential revision: https://reviews.llvm.org/D113289
This allows for better optimization of 'stores-of-existing-values' and
possibly helps passes further down the pipeline.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113712