This reverts commit 636c93ed11.
The original patch caused build failures on TSan buildbots. Commit 6ded69f294
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
This patch is a fix following the revert of 72ce759
(https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba)
and fixes the failure that it caused.
The above patch failed on the Thread Sanitizer buildbot with an out of
memory error. After an investigation, the cause was identified as an
explosion in debug intrinsics while running the Jump Threading pass on
ModuleMap.ll. The above patched prevented debug intrinsics from being
dropped when their Basic Block was deleted due to being "empty". In this
case, one of the functions in ModuleMap.ll had (after many optimization
passes) a very large number of debug intrinsics representing a set of
repeatedly inlined variables. Previously the vast majority of these were
silently dropped during Jump Threading when their blocks were deleted,
but as of the above patch they survived for longer, causing a large
increase in the number of debug intrinsics. These intrinsics were then
repeatedly cloned by the Jump Threading pass as edges were threaded,
multiplying the intrinsic count further. The memory consumed by this
process spiralled out of control, crashing the buildbot that uses TSan
(which has an estimated 5-10x memory overhead compared to non-sanitized
builds).
This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in
order to reduce the number of debug intrinsics down to a manageable
amount in cases where many intrinsics for the same variable end up
bunched together contiguously, as in this case.
Differential Revision: https://reviews.llvm.org/D73054
This causes a crash for the reproducer below
enum { a };
enum b { c, d };
e;
static _Bool g(struct f *h, enum b i) {
i &&j();
return a;
}
static k(char h, enum b i) {
_Bool l = g(e, i);
l;
}
m(h) {
k(h, c);
g(h, d);
}
This reverts commit aadb635e04.
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877
Reviewers: gchatelet, spatel, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74311
Summary:
Currently, `SCEVExpander::isHighCostExpansionHelper()` has the following logic:
```
if (auto *UDivExpr = dyn_cast<SCEVUDivExpr>(S)) {
// If the divisor is a power of two and the SCEV type fits in a native
// integer (and the LHS not expensive), consider the division cheap
// irrespective of whether it occurs in the user code since it can be
// lowered into a right shift.
if (auto *SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS()))
if (SC->getAPInt().isPowerOf2()) {
if (isHighCostExpansionHelper(UDivExpr->getLHS(), L, At,
BudgetRemaining, TTI, Processed))
return true;
const DataLayout &DL =
L->getHeader()->getParent()->getParent()->getDataLayout();
unsigned Width = cast<IntegerType>(UDivExpr->getType())->getBitWidth();
return DL.isIllegalInteger(Width);
}
```
Since this test does not have a datalayout specified,
`SCEVExpander::isHighCostExpansionHelper()` says that
`[[TMP2:%.*]] = lshr exact i64 [[TMP1]], 5` is high-cost, and didn't perform it.
But future patches will change that logic to solely rely on cost-model,
without any such datalayout checks, so i think it is best to show
that that change is ephemeral, and can already happen without costmodel changes.
Reviewers: reames, fhahn, sanjoy, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73717
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.
There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.
Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
Update test scripts were limited because they performed a single action
on the entire file and if that action was controlled by arguments, like
the one introduced in D68819, there was no record of it.
This patch introduces the capability of changing the arguments passed to
the script "on-the-fly" while processing a test file. In addition, an
"on/off" switch was added so that processing can be disabled for parts
of the file where the content is simply copied. The last extension is a
record of the invocation arguments in the auto generated NOTE. These
arguments are also picked up in a subsequent invocation, allowing
updates with special options enabled without user interaction.
To change the arguments the string `UTC_ARGS:` has to be present in a
line, followed by "additional command line arguments". That is
everything that follows `UTC_ARGS:` will be added to a growing list
of "command line arguments" which is reparsed after every update.
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D69701
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.
This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.
It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.
I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0%
test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5%
test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9%
test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0%
test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0%
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0%
test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0%
test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0%
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61314
This reverts commit d0c4d4fe09.
Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll."
This reverts commit 02266e64bb.
Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE."
This reverts commit 74f03e4ff0.
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme and
instructions with flags.
ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or
other flags when detecting patterns such as min/max/abs composed of
compare+select. But the value numbering / hashing mechanism used by
EarlyCSE intersects those flags to allow more CSE.
Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:
/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.
(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization pipelines.
I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.
Differential Revision: https://reviews.llvm.org/D74285
Test that instcombine and early-cse can cooperate
to reduce sequences of select patterns that are not
composed of the same underlying instructions.
There's a bug in EarlyCSE (PR41083), and we can test
how much a possible fix (D74285) may affect optimization.
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.
This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
This reverts commit b54a8ec1bc.
The commit triggered debug invariance (different output with/without
-g). The patch seems to have exposed a pre-existing invariance problem
in GlobalOpt, which I'll write a bug report for.
This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.
The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:
For all MemoryDefs StartDef:
1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
2. Check that there no reads between DomAccess and the StartDef by checking
all uses starting at DomAccess and walking until we see StartDef.
3. For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like
throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
3. StartDef completely overwrites DomDef.
4. Erase DomDef from the function and MemorySSA.
The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.
Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.
This is loosely based on D40480 by Dave Green.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72700
This copies the DSE tests into a MSSA subdirectory to test the MemorySSA
backed DSE implementation, without disturbing the original tests.
Differential Revision: https://reviews.llvm.org/D72145
This is a minimal but important advancement over the existing code. A
cast with an operand that is only used in the cast retains the no-alias
property of the operand.
Traversing PHI nodes is natural with the genericValueTraversal but also
a bit tricky. The problem is similar to the ones we have seen in AAAlign
and AADereferenceable, namely that we continue to increase the range in
each iteration. We use a pessimistic approach here to stop the
iterations. Nevertheless, optimistic information can now be propagated
through a PHI node.
The change is performed as stated by the FIXME and the tests are
adjusted. All changes look fine to me and values can be inferred as
undef without it being an error.
Casts can be handled natively by the ConstantRange class. We do limit it
to extends for now as we assume an integer type in different locations.
A TODO and a test case with a FIXME was added to remove that restriction
in the future.
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.
Fixes PR3082 and PR8929.
Differential Revision: https://reviews.llvm.org/D69069
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69954
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.
The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.
This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69930
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.
Differential Revision: https://reviews.llvm.org/D74079
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.
Differential Revision: https://reviews.llvm.org/D73849
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.
This tends to be more readable and less prone to worklist management
bugs.
Test changes are only superficial (instruction naming and order).
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).
Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.
As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.
Differential Revision: https://reviews.llvm.org/D74278
This improves on the following patch, which removed ARC runtime calls
taking inert global variables:
https://reviews.llvm.org/D62433
rdar://problem/59137105