isNoopIntrinsic returns true for some intrinsics that are modeled in
MemorySSA but do not actually read or write any memory and do not block
DSE. Such intrinsics should not be considered as read-clobbers.
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.
This patch enables MemorySSA DSE again.
This reverts commit 915310bf14.
Summary:
Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks.
Originally part of D79391.
Reviewers: fhahn, jfb, asbirlea
Differential Revision: https://reviews.llvm.org/D82588
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.
This patch temporarily switches back to the legacy DSE
implementation, while we investigate.
This reverts commit 9d172c8e9c.
When looking for memory defs killed by memory terminators the code
currently incorrectly ignores the size argument of llvm.lifetime.end.
This patch updates the code to use isMemTerminator and updates
isMemTerminator to use isOverwrite() to make sure locations that are
outside the range marked as dead by llvm.lifetime.end are not
considered. Note that isOverwrite is only used for llvm.lifetime.end,
because free-like functions make the whole underlying object dead.
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.
Notable fixes fc82006331, a0017c2bc2.
This reverts commit 3a59628f3c.
AliasAnalysis/MemoryLocation does not account for loops. Two
MemoryLocation can be must-overwrite, even if the first one writes
multiple locations in a loop.
This patch prevents removing such stores, by only considering candidates
that are known to be loop invariant, or executed in the same BB.
Currently the invariant check is quite conservative and only considers
Alloca and Alloca-like instructions and arguments as invariant base pointers.
It also considers GEPs with all constant indices and invariant bases as
invariant.
This can be improved in the future, but the current implementation has
only minor impact on the total number of stores eliminated (25903 vs
26047 for the baseline). There are some 2-10% swings for some individual
benchmarks. In roughly half of the cases, the number of stores removed
increases actually, because we skip candidates that are unlikely to be
valid candidates early.
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.
This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.
The issue was reported post-commit for fb109c42d9.
The tests have been updated and I plan to move them from the MSSA
directory up.
Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.
One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.
There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.
This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html
For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.
Impact on CTMark:
```
Legacy Pass Manager
exec instrs size-text
O3 + 0.60% - 0.27%
ReleaseThinLTO + 1.00% - 0.42%
ReleaseLTO-g. + 0.77% - 0.33%
RelThinLTO (link only) + 0.87% - 0.42%
RelLO-g (link only) + 0.78% - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
New Pass Manager
exec instrs. size-text
O3 + 0.95% - 0.25%
ReleaseThinLTO + 1.34% - 0.41%
ReleaseLTO-g. + 1.71% - 0.35%
RelThinLTO (link only) + 0.96% - 0.41%
RelLO-g (link only) + 2.21% - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
Reviewed By: asbirlea, xbolva00, nikic
Differential Revision: https://reviews.llvm.org/D87163
MemoryLocation has been taught about memcpy.inline, which means we can
get the memory locations read and written by it. This means DSE can
handle memcpy.inline
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.
Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.
Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87386
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.
This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.
It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.
This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.
http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86487
For DSE with MemorySSA it is beneficial to manually traverse the
defining access, instead of using a MemorySSA walker, so we can
better control the number of steps together with other limits and
also weed out invalid/unprofitable paths early on.
This patch requires a follow-up patch to be most effective, which I will
share soon after putting this patch up.
This temporarily XFAIL's the limit tests, because we now explore more
MemoryDefs that may not alias/clobber the killing def. This will be
improved/fixed by the follow-up patch.
This patch also renames some `Dom*` variables to `Earlier*`, because the
dominance relation is not really used/important here and potentially
confusing.
This patch allows us to aggressively cut down compile time, geomean
-O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores
removed. Subsequent patches will increase the number of removed stores
again, while keeping compile-time in check.
http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86486
Currently we repeatedly check the same uses for read clobbers in some
cases. We can avoid unnecessary checks by keeping track of the memory
accesses we already found read clobbers for. To do so, we just add
memory access causing read-clobbers to a set. Note that marking all
visited accesses as read-clobbers would be to pessimistic, as that might
include accesses not on any path to the actual read clobber.
If we do not find any read-clobbers, we can add all visited instructions
to another set and use that to skip the same accesses in the next call.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D75025
Using callCapturesBefore potentially improves the precision and the
number of stores we can remove. But in practice, it seems to have very
little impact in terms of stores removed. For example, for
SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are
removed (out of ~26900 stores removed). But in terms of compile-time, it
is very expensive and the patch gives substantial compile-time
improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g
-0.39%.
http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions
Avoid computing InvisibleToCallerBefore/AfterRet up front. In most
cases, this information is not really needed. Instead, introduce helper
functions to compute and cache the result on demand.
Notably, this also does not use PointerMayBeCapturedBefore for
isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as
starting instruction, making the caching ineffective. But it appears the
use of PointerMayBeCapturedBefore has very limited benefits in practice
(e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with
-O3 -flto). Refrain from using it for now, to limit-compile-time.
This gives some nice compile-time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions
Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.
In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.
We can always re-visit that in the future.
isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.
This shaves off a bit of compile-time.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.
In some cases, this brings a 2x speedup in terms of compile-time.
We are re-using tryToMergePartialOverlappingStores, which requires
earlier to domiante Later. In the long run,
tryToMergeParialOverlappingStores should be re-written using MemorySSA.
Fixes PR46513.
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
This fixes an instance where MemorySSA-using Dead Store Elimination is failing
to do a transformation that the non-MemorySSA-using version does.
Differential Revision: https://reviews.llvm.org/D83783
This patch adds support for eliminating stores by free & lifetime.end
calls. We can remove stores that are not read before calling a memory
terminator and we can eliminate all stores after a memory terminator
until we see a new lifetime.start. The second case seems to not really
trigger much in practice though.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72410
Summary:
Avoid exposing details about how roots are stored. This enables subsequent
type-erasure changes.
v5:
- cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE
Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83085
This patch add support for eliminating MemoryDefs that do not have any
aliasing users, which indicates that there are no reads/writes to the
memory location until the end of the function.
To eliminate such defs, we have to ensure that the underlying object is
not visible in the caller and does not escape via returning. We need a
separate check for that, as InvisibleToCaller does not consider returns.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker, george.burgess.iv
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72631
This patch extends storeIsNoop to also detect stores of 0 to an calloced
object. This basically ports the logic from legacy DSE to the MemorySSA
backed version.
It triggers in a few cases on MultiSource, SPEC2000, SPEC2006 with -O3
LTO:
Same hash: 218 (filtered out)
Remaining: 19
Metric: dse.NumNoopStores
Program base patch2 diff
test-suite...CFP2000/177.mesa/177.mesa.test 1.00 15.00 1400.0%
test-suite...6/482.sphinx3/482.sphinx3.test 1.00 14.00 1300.0%
test-suite...lications/ClamAV/clamscan.test 2.00 28.00 1300.0%
test-suite...CFP2006/433.milc/433.milc.test 1.00 8.00 700.0%
test-suite...pplications/oggenc/oggenc.test 2.00 9.00 350.0%
test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test NaN 137.00 nan%
test-suite...libquantum/462.libquantum.test NaN 3.00 nan%
test-suite...6/464.h264ref/464.h264ref.test NaN 7.00 nan%
test-suite...decode/alacconvert-decode.test NaN 2.00 nan%
test-suite...encode/alacconvert-encode.test NaN 2.00 nan%
test-suite...ications/JM/ldecod/ldecod.test NaN 9.00 nan%
test-suite...ications/JM/lencod/lencod.test NaN 39.00 nan%
test-suite.../Applications/lemon/lemon.test NaN 2.00 nan%
test-suite...pplications/treecc/treecc.test NaN 4.00 nan%
test-suite...hmarks/McCat/08-main/main.test NaN 4.00 nan%
test-suite...nsumer-lame/consumer-lame.test NaN 3.00 nan%
test-suite.../Prolangs-C/bison/mybison.test NaN 1.00 nan%
test-suite...arks/mafft/pairlocalalign.test NaN 30.00 nan%
Reviewers: efriedma, zoecarver, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82204
This updates the MemorySSA backed implementation to treat arguments
passed by value similar to allocas: in they are assumed to be invisible
in the caller. This is similar to how they are treated in legacy DSE.
Reviewers: efriedma, asbirlea, george.burgess.iv
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D82222
Currently we stop exploring candidates too early in some cases.
In particular, we can continue checking the defining accesses of
non-removable MemoryDefs and defs without analyzable write location
(read clobbers are already ruled out using MemorySSA at this point).
As we traverse the CFG backwards, we could end up reaching unreachable
blocks. For unreachable blocks, we won't have computed post order
numbers and because DomAccess is reachable, unreachable blocks cannot be
on any path from it.
This fixes a crash with unreachable blocks.
Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
This patch relaxes the post-dominance requirement for accesses to
objects visible after the function returns.
Instead of requiring the killing def to post-dominate the access to
eliminate, the set of 'killing blocks' (= blocks that completely
overwrite the original access) is collected.
If all paths from the access to eliminate and an exit block go through a
killing block, the access can be removed.
To check this property, we first get the common post-dominator block for
the killing blocks. If this block does not post-dominate the access
block, there may be a path from DomAccess to an exit block not involving
any killing block.
Otherwise we have to check if there is a path from the DomAccess to the
common post-dominator, that does not contain a killing block. If there
is no such path, we can remove DomAccess. For this check, we start at
the common post-dominator and then traverse the CFG backwards. Paths are
terminated when we hit a killing block or a block that is not executed
between DomAccess and a killing block according to the post-order
numbering (if the post order number of a block is greater than the one
of DomAccess, the block cannot be in in a path starting at DomAccess).
This gives the following improvements on the total number of stores
after DSE for MultiSource, SPEC2K, SPEC2006:
Tests: 237
Same hash: 206 (filtered out)
Remaining: 31
Metric: dse.NumRemainingStores
Program base new100 diff
test-suite...CFP2000/188.ammp/188.ammp.test 3624.00 3544.00 -2.2%
test-suite...ch/g721/g721encode/encode.test 128.00 126.00 -1.6%
test-suite.../Benchmarks/Olden/mst/mst.test 73.00 72.00 -1.4%
test-suite...CFP2006/433.milc/433.milc.test 3202.00 3163.00 -1.2%
test-suite...000/186.crafty/186.crafty.test 5062.00 5010.00 -1.0%
test-suite...-typeset/consumer-typeset.test 40460.00 40248.00 -0.5%
test-suite...Source/Benchmarks/sim/sim.test 642.00 639.00 -0.5%
test-suite...nchmarks/McCat/09-vor/vor.test 642.00 644.00 0.3%
test-suite...lications/sqlite3/sqlite3.test 35664.00 35563.00 -0.3%
test-suite...T2000/300.twolf/300.twolf.test 7202.00 7184.00 -0.2%
test-suite...lications/ClamAV/clamscan.test 19475.00 19444.00 -0.2%
test-suite...INT2000/164.gzip/164.gzip.test 2199.00 2196.00 -0.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 2380.00 2378.00 -0.1%
test-suite.../Benchmarks/Bullet/bullet.test 39335.00 39309.00 -0.1%
test-suite...:: External/Povray/povray.test 36951.00 36927.00 -0.1%
test-suite...marks/7zip/7zip-benchmark.test 67396.00 67356.00 -0.1%
test-suite...6/464.h264ref/464.h264ref.test 31497.00 31481.00 -0.1%
test-suite...006/453.povray/453.povray.test 51441.00 51416.00 -0.0%
test-suite...T2006/401.bzip2/401.bzip2.test 4450.00 4448.00 -0.0%
test-suite...Applications/kimwitu++/kc.test 23481.00 23471.00 -0.0%
test-suite...chmarks/MallocBench/gs/gs.test 6286.00 6284.00 -0.0%
test-suite.../CINT2000/254.gap/254.gap.test 13719.00 13715.00 -0.0%
test-suite.../Applications/SPASS/SPASS.test 30345.00 30338.00 -0.0%
test-suite...006/450.soplex/450.soplex.test 15018.00 15016.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 27780.00 27777.00 -0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 105285.00 105276.00 -0.0%
There might be potential to pre-compute some of the information of which
blocks are on the path to an exit for each block, but the overall
benefit might be comparatively small.
On the set of benchmarks, 15738 times out of 20322 we reach the
CFG check, the CFG check is successful. The total number of iterations
in the CFG check is 187810, so on average we need less than 10 steps in
the check loop. Bumping the threshold in the loop from 50 to 150 gives a
few small improvements, but I don't think they warrant such a big bump
at the moment. This is all pending further tuning in the future.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: george.burgess.iv
Differential Revision: https://reviews.llvm.org/D78932
- Now all SalvageDebugInfo() calls will mark undef if the salvage
attempt fails.
Reviewed by: vsk, Orlando
Differential Revision: https://reviews.llvm.org/D78369
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.
For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.
The new stat provides the absolute number of stores surviving after
DSE.
Reviewers: dmgreen, bryant, asbirlea, jfb
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D78830
This patch slightly improves the formatting of the debug output, adds a
few missing outputs and makes some existing outputs more consistent with
the rest.
We can eliminate MemoryDefs of objects not accessible after the function
returns (e.g. alloca), if there are no reads between the MemoryDef and
any function exits. We can stop traversing paths that completely
overwrite the memory location of the MemoryDef.
This patch was split off D73763.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: asbirlea, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D77736
callCapturesBefore always returns ModRef , if UseInst isn't a call. As
we only call it if we already know Mod is set, this only destroys the
Must bit for non-calls.
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.
To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.
This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration. For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72148
DSE would mistakenly remove store (2):
a = calloc(n+1)
for (int i = 0; i < n; i++) {
store 1, a[i+1] // (1)
store 0, a[i] // (2)
}
The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.
Reviewed By: efriedma, bjope
Differential Revision: https://reviews.llvm.org/D68006
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757
Can be used like
-debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72147
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
This reverts commit d0c4d4fe09.
Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll."
This reverts commit 02266e64bb.
Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE."
This reverts commit 74f03e4ff0.
This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.
The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:
For all MemoryDefs StartDef:
1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
2. Check that there no reads between DomAccess and the StartDef by checking
all uses starting at DomAccess and walking until we see StartDef.
3. For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like
throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
3. StartDef completely overwrites DomDef.
4. Erase DomDef from the function and MemorySSA.
The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.
Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.
This is loosely based on D40480 by Dave Green.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72700
This copies the DSE tests into a MSSA subdirectory to test the MemorySSA
backed DSE implementation, without disturbing the original tests.
Differential Revision: https://reviews.llvm.org/D72145
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Patch by Ankit <quic_aankit@quicinc.com>
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.
This solves problems seen in
https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.
The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=41949
Reviewers: spatel, efriedma, fhahn
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62250
llvm-svn: 361605
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Bug: https://bugs.llvm.org/show_bug.cgi?id=41175
In the bug test case the DSE pass is shortening the range of memory that a
memset is working on. A getelementptr is generated so that the new
starting address can be passed to memset. This instruction was not given
a DebugLoc.
To fix the bug, copy the DebugLoc from the memset instruction.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D60556
llvm-svn: 358270
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.
This reverts commit 2b85de4383.
llvm-svn: 357257
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.
Fixes PR38829.
Reviewers: rnk, efriedma, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59789
llvm-svn: 357208