Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
Instead of first creating a lambda for calculating the range,
then collecting the ranges for the operands, and then calling the
lambda on those ranges, we can first calculate the operand ranges
and then calculate the result directly in the switch.
According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html,
"Hoisting identical instructions which appear in several successor
blocks into a predecessor block. In this case there is no single
merged instruction. The rule for dropping locations applies".
Thanks to Yuanbo Li for reporting this.
Reviewed By: dblaikie
Reviewers: sebpop, tejohnson, dblaikie
Differential Revision: https://reviews.llvm.org/D122730
Factor in the TBAA of adjacent stores instead of just the head store
when merging stores into a memset. We were seeing GVN remove a load that
had a TBAA that matched the 2nd store because GVN determined it didn't
match the TBAA of the memset. The memset had the TBAA of only the first
store.
i.e. Loading the field pi_ of shared_count after memset to create an
array of shared_ptr
template<class T>
class shared_ptr {
T *p;
shared_count refcount;
};
class shared_count {
sp_counted_base *pi_;
};
Differential Revision: https://reviews.llvm.org/D122205
Avoids merge errors when opaque pointers are loaded into different types.
Reviewed by: jcranmer-intel, hiraditya
Differential Revision: https://reviews.llvm.org/D122521
According to definition of canonical form, it is a canonical
if scale reg does not contain addrec for loop L then none of bases
should contain addrec for this loop.
The critical word here is "contains".
Current checker of canonical form checks not "containing" property
but "is". So it does not check whether it contains but whether it is.
Fix the checker and canonicalizing utility to follow definition.
Without this fix in the test attached the base formula looking as
reg((-1 * {0,+,8}<nuw><nsw><%bb2>)<nsw>) + 1*reg((8 * (%arg /u 8))<nuw>)
is considered as conanocial while base contains an addrec.
And modified formula we want to insert
reg({0,+,8}<nuw><nsw><%bb2>) + 1*reg((-8 * (%arg /u 8)))
is considered as not canonical.
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D122457
We have the same code repeated in both callers, sink it into callee.
The motivation here isn't just code style, we can also defer the relatively expensive aliasing checks until the cheap structural preconditions have been validated. (e.g. Don't bother aliasing if src is not an alloca.) This helps compile time significantly.
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
EarlyCSE currently optimizes all MemoryUses upfront. However,
EarlyCSE only actually queries the clobbering memory access for
a subset of uses, namely those where a CSE candidate has already
been identified. Delaying use optimization to the clobber query
improves compile-time in practice.
This change is not NFC because EarlyCSE has a limit on the number
of clobber queries (EarlyCSEMssaOptCap), in which case it falls
back to the defining access. The defining access for uses will now
no longer coincide with the optimized access.
If there are performance regressions from this change, we should
be able to address them by raising this limit.
Differential Revision: https://reviews.llvm.org/D121987
Rather than iterating over users and comparing operands, iterate
over uses and check operand number. Otherwise, we'll end up
promoting a store twice if it has two equal operands.
This can only happen with opaque pointers, as otherwise both
operands differ by a level of indirection, so a bitcast would have
to be involved.
Fixes https://github.com/llvm/llvm-project/issues/54495.
There is a bunch of code improvements in the patch: marking as const everything what can be
const and fixing some typos in comments.
Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces
method, the TTI parameter is not required because the same field is in the class.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D121671
[SCCP] do not clean up dead blocks that have their address taken
Fixes a crash observed in IPSCCP.
Because the SCCPSolver has already internalized BlockAddresses as
Constants or ConstantExprs, we don't want to try to update their Values
in the ValueLatticeElement. Instead, continue to propagate these
BlockAddress Constants, continue converting BasicBlocks to unreachable,
but don't delete the "dead" BasicBlocks which happen to have their
address taken. Leave replacing the BlockAddresses to another pass.
Fixes: https://github.com/llvm/llvm-project/issues/54238
Fixes: https://github.com/llvm/llvm-project/issues/54251
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D121744
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.
This allows reproducing #54023 using opt.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D121944
The way the pass is actually used in the optimization pipeline,
TLI will be available, but this is not the case when running just
-lower-constant-intrinsics in tests, which ends up being quite
confusing.
Require TLI unconditionally, as we usually do.
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.
This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.
Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).
Differential Revision: https://reviews.llvm.org/D121381
LoopSimplifyCFG may process loops that are not in
loop-simplify/canonical form. For loops not in canonical form, exit
blocks may be reachable from non-loop blocks and we cannot consider them
as dead if they only are not reachable from the loop itself.
Unfortunately the smallest test I could come up with requires running
multiple passes:
-passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)'
The reason is that loops are canonicalized at the beginning of loop
pipelines, so a later transform has to break canonical form in a way
that breaks LoopSimplifyCFG's dead-exit analysis.
Alternatively we could try to require all loop passes to maintain
canonical form. That in turn would also require additional verification.
Fixes#54023, #49931.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D121925
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Differential Revision: https://reviews.llvm.org/D115907
When a load extends past the extent of the alloca, SROA will
restrict the slice size to extend to the end of the alloca only.
However, presplitting was asserting that the load size and the
slice size match exactly, which does not hold in this case.
Relax the assertion to only require that the load size is greater
or equal than the slice size.
Context: I needed this for https://github.com/google/iree/pull/8474 .
I found that TSan instrumentation expects vector sizes to be <= 16,
and in my project (IREE) we have tests with higher vector sizes.
That left some test functions uninstrumented, resulting in crashes as
instrumented code called into them.
Differential Revision: https://reviews.llvm.org/D121182
This patch is motivated by pr48057 where an output dependency is not detected
since loop interchange did not check a store instruction with itself.
Fixed that deficiency.
Reviewed By: bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D118102
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
After moving the CanAdd check in c60cdb44f7 and using it for
the assume cases as well, the passed in block may not have a branch
instruction as terminator. This can trigger the assertion. Given the new
use case, it doesn't add value any longer and can be removed.
Fixes https://github.com/llvm/llvm-project/issues/54281
When decomposing constraints for unsigned conditions, we can use
negative values by zero-extending them, as long as they are less than
the maximum constraint value.
Fixes https://github.com/llvm/llvm-project/issues/54224
Add missing CanAdd check before adding a condition from an assume
to the successor blocks. When adding information from assume to
successor blocks we need to perform the same CanAdd as we do for adding
a condition from a branch.
Fixes https://github.com/llvm/llvm-project/issues/54217
This patch extends ConstraintElimination to also remove dead variables
when removing a constraint. When a constraint is removed because it is
out of scope, all new variables added for this constraint can also be
removed.
This keeps the total size of the systems much smaller, because it
reduces the number of variables drastically.
It also fixes a bug where variables where removed incorrectly.
Fixes https://github.com/llvm/llvm-project/issues/54228
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
Skip phi nodes in the preheader. They may not be considered loop
invariant by the assertion below.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D121010
There seems to be one more uncaught problem, SROA may now end up trying
to re-re-repromote the just-promoted shadow alloca, and do that endlessly.
This reverts commit adc0984d81.
This is inspired by the original variant of D109749 by Graham Hunter,
but is a more general version.
Roughly, instead of promoting the alloca, we call it
a shadow/backing alloca, go through all it's slices,
clone(!) instructions that operated on it,
but make them operate on the cloned alloca,
and promote cloned alloca instead.
This keeps the shadow/backing alloca, and all the original instructions
around, which results in said shadow/backing alloca being
a perfect mirror/representation of the promoted alloca's content,
so calls that take the alloca as arguments (non-capturingly!)
can be supported.
For now, we require that the calls also don't modify the alloca's content,
but that is only to simplify the initial implementation,
and that will be supported in a follow-up.
Overall, this leads to *smaller* codesize:
https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=size-total
and is roughly neutral compile-time wise:
https://llvm-compile-time-tracker.com/compare.php?from=a8b4f5bbab62091835205f3d648902432a4a5b58&to=aeae054055b125b011c1122f82c86457e159436f&stat=instructions
This relands commit 703240c71f,
that was reverted by commit 7405581f7c,
because the assertion `isa<LoadInst>(OrigInstr)` didn't hold in practice,
as the newly added test `@select_of_ptrs` shows:
If the pointers into alloca are used by select's/PHI's, then even if
we manage to fracture the alloca, some sub-alloca's will likely remain.
And if there are any non-capturing calls, then we will also decide to
keep the original backing alloca around, and we suddenly ~doubled
the alloca size, and the amount of memory traffic.
I'm not sure if this is a problem or we could live with it,
but let's leave that for later...
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D113520