This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!
Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.
This reapplies the previous patch with a fix for a use-after-free.
Differential Revision: https://reviews.llvm.org/D110568
After rG452714f8f8037ff37f9358317651d1652e231db2, the Function `F` retrieved in LoopPredication is not used.
Remove this unused variable to stop some buildbots (ASAN, clang-ppc) from failing.
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.
This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.
This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM. This caused massive compile time
in our downstream LPM invocation which contained loop predication.
See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.
Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
It can happen that after widening of the IV, flattening may not be possible,
e.g. when it is deemed unprofitable. We were not properly checking this, which
resulted in flattening being applied when it shouldn't, also leading to
incorrect results (miscompilation).
This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980)
Differential Revision: https://reviews.llvm.org/D110712
This fixes an issue exposed by D71539, where IndVarSimplify tries
to access an invalid cached SCEV expression after making changes to the
underlying PHI instruction earlier.
When changing the incoming value of a PHI, forget the cached SCEV for
the PHI.
This reverts commit f6954bf804.
This breaks the test-suite O3 build:
/home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc
While deleting: i64 %
Use still stuck around after Def is destroyed: %12620 = mul i64 %12619, <badref>
clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!
Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.
Differential Revision: https://reviews.llvm.org/D110568
This patch improves the effectiveness of ADCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!
Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.
Differential Revision: https://reviews.llvm.org/D110462
In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV
widening was introduced. If after widening of the IV the transformation is
*not* applied, the narrow phi node was incorrectly modified, which should only
happen if flattening happens. This can be seen in the added test widen-iv2.ll,
which incorrectly had 1 incoming value, but should have its original 2 incoming
values, which is now restored.
Differential Revision: https://reviews.llvm.org/D110234
As it contains a self-reference, the default copy/move ctors
would not be safe.
Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.
This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.
This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.
Differential Revision: https://reviews.llvm.org/D110368
It is sufficient that the object has not been captured before the
load that produces the pointer we're loading. A capture after that
can not affect the already loaded pointer.
This is small part of D110368 applied separately.
It seems the crashes we saw wasn't caused by this (see comments on the review).
> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290
This reverts commit 4604695d7c.
This reverts the revert commit df56fc6ebb.
This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.
It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.
This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
It caused compiler crashes, see comment on the code review for repro.
> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290
This reverts commit 1e3c6fc7cb.
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.
Differential Revision: https://reviews.llvm.org/D110290
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.
Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.
This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.
If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.
This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.
I will share a follow-up patch to also use the information for call
instructions to fix PR50220.
In terms of compile-time, the impact is low in general,
NewPM-O3: +0.05%
NewPM-ReleaseThinLTO: +0.05%
NewPM-ReleaseLTO-g: +0.03
with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%
The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109844
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:
* We may end up moving the original first block into the middle of
the chain, in which case the "extra work" instructions will also
be in the middle of the chain, resulting in invalid IR
(reported in https://reviews.llvm.org/D108782#3005583).
* Reordering branches is generally not legal, because it may
introduce branch on poison, which is UB (PR51845). The merging
done by MergeICmps is legal as long as we assume that memcmp()
works on frozen memory, but the reordering of unmerged comparisons
is definitely incorrect (without inserting freeze instructions),
so we should avoid it.
There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.
I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.
Differential Revision: https://reviews.llvm.org/D110024
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().
With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():
[...]
cont2.i:
br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i
handler.type_mismatch3.i:
%3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
unreachable
cont4.i:
unreachable
[...]
with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.
This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D109221
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.
Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.
Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D106947
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.
Differential Revision: https://reviews.llvm.org/D110043
All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.
This reverts commit 6fec6552f5.
The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.
The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D109044
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D110008
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D109855
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.
Differential Revision: https://reviews.llvm.org/D109354
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
LoopFlatten wasn't triggering on this motivating case after IV widening:
void foo(int *A, int N, int M) {
for (int i = 0; i < N; ++i)
for (int j = 0; j < M; ++j)
f(A[i*M+j]);
}
The reason was that the old induction phi nodes were getting in the way. These
narrow and dead induction phis are not always trivially dead, and having both
the narrow and wide IVs confused the analysis and caused it to bail. This adds
some extra bookkeeping for these old phis, so we can filter them out when
checks on phi nodes are performed. Other clean up passes will get rid of these
old phis and increment instructions.
As this was one of the motivating examples from the beginning, it was
surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG
is just slightly different.
Differential Revision: https://reviews.llvm.org/D109309