From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.
Fixes PR47991.
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This is to simplify icmp instructions in the form like:
%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)
Here @f32 and @f64 are two functions.
Differential Revision: https://reviews.llvm.org/D87850
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.
To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.
We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)
The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.
Fixes an "infinite" compile issue my team found on an internal workoad.
Differential Revision: https://reviews.llvm.org/D89080
-debug-pass is a legacy PM only option.
Some tests checks that the pass returned that it made a change,
which is not relevant to the NPM, since passes return PreservedAnalyses.
Some tests check that passes are freed at the proper time, which is also
not relevant to the NPM.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87945
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.
Differential Revision: https://reviews.llvm.org/D87183
Currently IPSCCP (and others like CVP/GVN) blindly propagate pointer
equalities. In certain cases, that leads to dereferenceable pointers
being replaced, as in the example test case.
I think this is not allowed, as it introduces an access of an
un-dereferenceable pointer. Note that the pointer is inbounds, but one
past the last element, so it is valid, but not dereferenceable.
This patch is mostly to highlight the issue and start a discussion.
Currently it only checks for specifically looking
one-past-the-last-element pointers with array typed bases.
This causes the mis-compile outlined in
https://stackoverflow.com/questions/55754313/is-this-gcc-clang-past-one-pointer-comparison-behavior-conforming-or-non-standar
In the test case, if we replace %p with the GEP for the store, we
subsequently determine that the store and the load cannot alias, because
they are to different underlying objects.
Note that Alive2 seems to think that the replacement is valid:
https://alive2.llvm.org/ce/z/2rorhk
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85332
In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments
and its corresponding call sites.
The bug is exposed when the value of an input argument of the function is assigned to a physical register and
because of the argument having a returned attribute, the value of this physical register will continue to be used
as the function return value right after the call instruction returns, even if the value that this register holds may
be clobbered during the function call. This potentially results in incorrect values being used afterwards.
Reviewed By: jdoerfert, fhahn
Differential Revision: https://reviews.llvm.org/D84220
Teach SCCP to create notconstant lattice values from inequality
assumes and nonnull metadata, and update getConstant() to make
use of them. Additionally isOverdefined() needs to be changed to
consider notconstant an overdefined value.
Handling inequality branches is delayed until our branch on undef
story in other passes has been improved.
Differential Revision: https://reviews.llvm.org/D83643
If an analysis is actually invalidated, there's already a log statement
for that: 'Invalidating analysis: FooAnalysis'.
Otherwise the statement is not very useful.
Reviewed By: asbirlea, ychen
Differential Revision: https://reviews.llvm.org/D84981
Determine whether switch edges are feasible based on range information,
and remove non-feasible edges lateron.
This does not try to determine whether the default edge is dead,
as we'd have to determine that the range is fully covered by the
cases for that.
Another limitation here is that we don't remove dead cases that
have the same successor as a live case. I'm not handling this
because I wanted to keep the edge removal based on feasible edges
only, rather than inspecting ranges again there -- this does not
seem like a particularly useful case to handle.
Differential Revision: https://reviews.llvm.org/D84270
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
As far as I know, ipconstprop has not been used in years and ipsccp has
been used instead. This has the potential for confusion and sometimes
leads people to spend time finding & reporting bugs as well as
updating it to work with the latest API changes.
This patch moves the tests over to SCCP. There's one functional difference
I am aware of: ipconstprop propagates for each call-site individually, so
for functions that are called with different constant arguments it can sometimes
produce better results than ipsccp (at much higher compile-time cost).But
IPSCCP can be thought to do so as well for internal functions and as mentioned
earlier, the pass seems unused in practice (and there are no plans on working
towards enabling it anytime).
Also discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84447
Reapply with DTU update moved after CFG update, which is a
requirement of the API.
-----
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)
Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.
Differential Revision: https://reviews.llvm.org/D84264
This patch updates IPSCCP to drop argmemonly and
inaccessiblemem_or_argmemonly if it replaces a pointer argument.
Fixes PR46717.
Reviewers: efriedma, davide, nikic, jdoerfert
Reviewed By: efriedma, jdoerfert
Differential Revision: https://reviews.llvm.org/D84432
It breaks stage-2 build. Clang crashed when compiling
llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp
llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)
Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.
Differential Revision: https://reviews.llvm.org/D84264
As long as RenamedOp is not guaranteed to be accurate, we cannot
assert here and should just return false. This was already done
for the other conditions in this function.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.
And adjust the indbrtest4 test to actually test what it's supposed
to. BB1 is supposed to be eliminated here, but isn't, because
BB0 still branches to it. This was lost due to the incomplete CHECK
lines.
If we inferred a range for the function return value, we can add !range
at all call-sites of the function, if the range does not include undef.
Reviewers: efriedma, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83952
Both users of predicteinfo (NewGVN and SCCP) are interested in
getting a cmp constraint on the predicated value. They currently
implement separate logic for this. This patch adds a common method
for this in PredicateBase.
This enables a missing bit of PredicateInfo handling in SCCP: Now
the predicate on the condition itself is also used. For switches
it means we know that the switched-on value is the same as the case
value. For assumes/branches we know that the condition is true or
false.
Differential Revision: https://reviews.llvm.org/D83640
Some of the tests in the llvm/test/Transforms/IPConstantProp directory
actually only use -ipsccp. Those tests belong to the other (IP)SCCP
tests in llvm/test/Transforms/SCCP/ and this commits moves them there to
avoid confusion with IPConstantProp.
Currently SCCP does not combine the information of conditions joined by
AND in the true branch or OR in the false branch.
For branches on AND, 2 copies will be inserted for the true branch, with
one being the operand of the other as in the code below. We can combine
the information using intersection. Note that for the OR case, the
copies are inserted in the false branch, where using intersection is
safe as well.
define void @foo(i32 %a) {
entry:
%lt = icmp ult i32 %a, 100
%gt = icmp ugt i32 %a, 20
%and = and i1 %lt, %gt
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] }
%a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a)
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] }
%a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0)
br i1 %and, label %true, label %false
true: ; preds = %entry
call void @use(i32 %a.1)
%true.1 = icmp ne i32 %a.1, 20
call void @use.i1(i1 %true.1)
ret void
false: ; preds = %entry
call void @use(i32 %a.1)
ret void
}
Reviewers: efriedma, davide, mssimpso, nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D77808
When all else fails, use range metadata to constrain the result
of loads and calls. It should also be possible to use !nonnull,
but that would require some general support for inequalities in
SCCP first.
Differential Revision: https://reviews.llvm.org/D83179
Take assume predicates into account when visiting ssa.copy. The
handling is the same as for branch predicates, with the difference
that we're always on the true edge.
Differential Revision: https://reviews.llvm.org/D83257
This patch updates SCCP/IPSCCP to use the computed range info to turn
sexts into zexts, if the value is known to be non-negative. We already
to a similar transform in CorrelatedValuePropagation, but it seems like
we can catch a lot of additional cases by doing it in SCCP/IPSCCP as
well.
The transform is limited to ranges that are known to not include undef.
Currently constant ranges from conditions are treated as potentially
containing undef, due to PR46144. Once we flip this, the transform will
be more effective in practice.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81756
Currently SCCP does not widen PHIs, stores or along call edges
(arguments/return values), but on operations that directly extend ranges
(like binary operators).
This means PHIs, stores and call edges are not pessimized by widening
currently, while binary operators are. The main reason for widening
operators initially was that opting-out for certain operations was
more straight-forward in the initial implementation (and it did not
matter too much, as range support initially was only implemented for a
very limited set of operations.
During the discussion in D78391, it was suggested to consider flipping
widening to PHIs, stores and along call edges. After adding support for
tracking the number of range extensions in ValueLattice, limiting the
number of range extensions per value is straight forward.
This patch introduces a MaxWidenSteps option to the MergeOptions,
limiting the number of range extensions per value. For PHIs, it seems
natural allow an extension for each (active) incoming value plus 1. For
the other cases, a arbitrary limit of 10 has been chosen initially. It would
potentially make sense to set it depending on the users of a
function/global, but that still needs investigating. This potentially
leads to more state-changes and longer compile-times.
The results look quite promising (MultiSource, SPEC):
Same hash: 179 (filtered out)
Remaining: 58
Metric: sccp.IPNumInstRemoved
Program base widen-phi diff
test-suite...ks/Prolangs-C/agrep/agrep.test 58.00 82.00 41.4%
test-suite...marks/SciMark2-C/scimark2.test 32.00 43.00 34.4%
test-suite...rks/FreeBench/mason/mason.test 6.00 8.00 33.3%
test-suite...langs-C/football/football.test 104.00 128.00 23.1%
test-suite...cations/hexxagon/hexxagon.test 36.00 42.00 16.7%
test-suite...CFP2000/177.mesa/177.mesa.test 214.00 249.00 16.4%
test-suite...ngs-C/assembler/assembler.test 14.00 16.00 14.3%
test-suite...arks/VersaBench/dbms/dbms.test 10.00 11.00 10.0%
test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 47.00 9.3%
test-suite...ications/JM/ldecod/ldecod.test 179.00 195.00 8.9%
test-suite...CFP2006/433.milc/433.milc.test 249.00 265.00 6.4%
test-suite.../CINT2000/175.vpr/175.vpr.test 98.00 104.00 6.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 74.00 5.7%
test-suite...CFP2000/188.ammp/188.ammp.test 71.00 75.00 5.6%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 111.00 117.00 5.4%
test-suite...ce/Applications/Burg/burg.test 41.00 43.00 4.9%
test-suite...000/197.parser/197.parser.test 66.00 69.00 4.5%
test-suite...tions/lambda-0.1.3/lambda.test 23.00 24.00 4.3%
test-suite...urce/Applications/lua/lua.test 301.00 313.00 4.0%
test-suite...TimberWolfMC/timberwolfmc.test 76.00 79.00 3.9%
test-suite...lications/ClamAV/clamscan.test 991.00 1030.00 3.9%
test-suite...plications/d/make_dparser.test 53.00 55.00 3.8%
test-suite...fice-ispell/office-ispell.test 83.00 86.00 3.6%
test-suite...lications/obsequi/Obsequi.test 28.00 29.00 3.6%
test-suite.../Prolangs-C/bison/mybison.test 56.00 58.00 3.6%
test-suite.../CINT2000/254.gap/254.gap.test 170.00 176.00 3.5%
test-suite.../Applications/lemon/lemon.test 30.00 31.00 3.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1202.00 1240.00 3.2%
test-suite...pplications/treecc/treecc.test 79.00 81.00 2.5%
test-suite...chmarks/MallocBench/gs/gs.test 357.00 366.00 2.5%
test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9%
test-suite...T2006/445.gobmk/445.gobmk.test 1697.00 1724.00 1.6%
test-suite...006/453.povray/453.povray.test 1812.00 1839.00 1.5%
test-suite.../Benchmarks/Bullet/bullet.test 337.00 342.00 1.5%
test-suite.../CINT2000/252.eon/252.eon.test 426.00 432.00 1.4%
test-suite...T2000/300.twolf/300.twolf.test 214.00 217.00 1.4%
test-suite...pplications/oggenc/oggenc.test 244.00 247.00 1.2%
test-suite.../CINT2006/403.gcc/403.gcc.test 4008.00 4055.00 1.2%
test-suite...T2006/456.hmmer/456.hmmer.test 175.00 177.00 1.1%
test-suite...nal/skidmarks10/skidmarks.test 430.00 434.00 0.9%
test-suite.../Applications/sgefa/sgefa.test 115.00 116.00 0.9%
test-suite...006/447.dealII/447.dealII.test 1082.00 1091.00 0.8%
test-suite...6/482.sphinx3/482.sphinx3.test 141.00 142.00 0.7%
test-suite...ocBench/espresso/espresso.test 152.00 153.00 0.7%
test-suite...3.xalancbmk/483.xalancbmk.test 4003.00 4025.00 0.5%
test-suite...lications/sqlite3/sqlite3.test 548.00 551.00 0.5%
test-suite...marks/7zip/7zip-benchmark.test 5522.00 5551.00 0.5%
test-suite...nsumer-lame/consumer-lame.test 208.00 209.00 0.5%
test-suite...:: External/Povray/povray.test 1556.00 1563.00 0.4%
test-suite...000/186.crafty/186.crafty.test 298.00 299.00 0.3%
test-suite.../Applications/SPASS/SPASS.test 2019.00 2025.00 0.3%
test-suite...ications/JM/lencod/lencod.test 8427.00 8449.00 0.3%
test-suite...6/464.h264ref/464.h264ref.test 6797.00 6813.00 0.2%
test-suite...6/471.omnetpp/471.omnetpp.test 431.00 430.00 -0.2%
test-suite...006/450.soplex/450.soplex.test 446.00 447.00 0.2%
test-suite...0.perlbench/400.perlbench.test 1729.00 1727.00 -0.1%
test-suite...000/255.vortex/255.vortex.test 3815.00 3819.00 0.1%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D79036
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.
The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.
Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.
This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.
Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.
Differential Revision: https://reviews.llvm.org/D78403
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78433
Currently an unknown/undef value is marked as overdefined when merged
with an empty range. An empty range can occur in unreachable/dead code.
When merging the new unknown state (= no value known yet) with an empty
range, there still isn't any information about the value yet and we can
stay in unknown.
This gives a few nice improvements on the number of instructions removed
by IPSCCP:
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...rks/FreeBench/mason/mason.test 3.00 6.00 100.0%
test-suite...nchmarks/McCat/18-imp/imp.test 3.00 5.00 66.7%
test-suite...C/CFP2000/179.art/179.art.test 2.00 3.00 50.0%
test-suite...ijndael/security-rijndael.test 2.00 3.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 40.00 58.00 45.0%
test-suite...ce/Applications/Burg/burg.test 26.00 37.00 42.3%
test-suite...cCat/03-testtrie/testtrie.test 3.00 4.00 33.3%
test-suite...Source/Benchmarks/sim/sim.test 29.00 36.00 24.1%
test-suite.../Applications/spiff/spiff.test 9.00 11.00 22.2%
test-suite...s/FreeBench/neural/neural.test 5.00 6.00 20.0%
test-suite...pplications/treecc/treecc.test 66.00 79.00 19.7%
test-suite...langs-C/football/football.test 85.00 101.00 18.8%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 90.00 105.00 16.7%
test-suite...oxyApps-C++/miniFE/miniFE.test 37.00 43.00 16.2%
test-suite...rks/FreeBench/pifft/pifft.test 26.00 30.00 15.4%
test-suite...lications/sqlite3/sqlite3.test 481.00 548.00 13.9%
test-suite...marks/7zip/7zip-benchmark.test 4875.00 5522.00 13.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1117.00 1197.00 7.2%
test-suite...0.perlbench/400.perlbench.test 1618.00 1732.00 7.0%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78667
visitExtractValueInst uses mergeInValue, so it already can handle
constant ranges. Initially the early exit was using isOverdefined to
keep things as NFC during the initial move to ValueLatticeElement.
As the function already supports constant ranges, it can just use
ValueState[&I].isOverdefined.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78393
For non-integer constants/expressions and overdefined, I think we can
just use SimplifyBinOp to do common folds. By just passing a context
with the DL, SimplifyBinOp should not try to get additional information
from looking at definitions.
For overdefined values, it should be enough to just pass the original
operand.
Note: The comment before the `if (isconstant(V1State)...` was wrong
originally: isConstant() also matches integer ranges with a single
element. It is correct now.
Reviewers: efriedma, davide, mssimpso, aartbik
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76459
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.
For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.
1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.
With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:
For MultiSource, SPEC2000 & SPEC2006:
Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...Source/Benchmarks/sim/sim.test 10.00 71.00 610.0%
test-suite...CFP2000/177.mesa/177.mesa.test 361.00 1626.00 350.4%
test-suite...encode/alacconvert-encode.test 141.00 602.00 327.0%
test-suite...decode/alacconvert-decode.test 141.00 602.00 327.0%
test-suite...CI_Purple/SMG2000/smg2000.test 1639.00 4093.00 149.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 75.00 163.00 117.3%
test-suite...T2006/401.bzip2/401.bzip2.test 358.00 513.00 43.3%
test-suite...rks/FreeBench/pifft/pifft.test 11.00 15.00 36.4%
test-suite...langs-C/unix-tbl/unix-tbl.test 4.00 5.00 25.0%
test-suite...lications/sqlite3/sqlite3.test 541.00 667.00 23.3%
test-suite.../CINT2000/254.gap/254.gap.test 243.00 299.00 23.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 25.00 29.00 16.0%
test-suite...marks/7zip/7zip-benchmark.test 1135.00 1304.00 14.9%
test-suite...lications/ClamAV/clamscan.test 1105.00 1268.00 14.8%
test-suite...urce/Applications/lua/lua.test 398.00 436.00 9.5%
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...C/CFP2000/179.art/179.art.test 1.00 3.00 200.0%
test-suite...006/447.dealII/447.dealII.test 429.00 1056.00 146.2%
test-suite...nch/fourinarow/fourinarow.test 3.00 7.00 133.3%
test-suite...CI_Purple/SMG2000/smg2000.test 818.00 1748.00 113.7%
test-suite...ks/McCat/04-bisect/bisect.test 3.00 5.00 66.7%
test-suite...CFP2000/177.mesa/177.mesa.test 165.00 255.00 54.5%
test-suite...ediabench/gsm/toast/toast.test 18.00 27.00 50.0%
test-suite...telecomm-gsm/telecomm-gsm.test 18.00 27.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 24.00 35.00 45.8%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 62.00 44.2%
test-suite...encode/alacconvert-encode.test 46.00 66.00 43.5%
test-suite...decode/alacconvert-decode.test 46.00 66.00 43.5%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 17.00 41.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 41.00 32.3%
test-suite.../CINT2000/254.gap/254.gap.test 117.00 154.00 31.6%
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76611
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.
A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below
define i32 @f(i32 %a, i1 %c) {
br i1 %c, label %true, label %false
true:
%a.255 = and i32 %a, 255
br label %exit
false:
br label %exit
exit:
%p = phi i32 [ %a.255, %true ], [ undef, %false ]
%f.1 = icmp eq i32 %p, 300
call void @use(i1 %f.1)
%res = and i32 %p, 255
ret i32 %res
}
In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the second use might not be
< 256.
Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.
Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76931
For casts with constant range operands, we can use
ConstantRange::castOp.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71938
For PHIs with multiple incoming values, we can improve precision by
using constant ranges for integers. We can over-approximate phis
by merging the incoming values.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71933
If one of the operands of a binary operator is a constant range, we can
use ConstantRange::binaryOp to approximate the result.
We still handle single element constant ranges as we did previously,
with ConstantExpr::get(), because ConstantRange::binaryOp still gives
worse results in a few cases for single element ranges.
Also note that we bail out early if any of the operands is still unknown.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71936
For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71935
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.
This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used
* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D60582
For tracked globals that are unknown after solving, we expect all
non-store uses to be replaced.
This is a follow-up to f8045b250d, which removed forcedconstant.
We should not mark unknown loads as overdefined, as they either load
from an unknown pointer or an undef global. Restore the original logic
for loads.
This includes a fix for cases where things get marked as overdefined in
ResolvedUndefsIn, but we later discover a constant. To avoid crashing,
we consistently bail out on overdefined values in the visitors. This is
similar to the previous behavior with forcedconstant.
This reverts the revert commit 02b72f564c.
This version includes a fix for a set of crashes caused by marking
values depending on a yet unknown & tracked call as overdefined.
In some cases, we would later discover that the call has a constant
result and try to mark a user of it as constant, although it was already
marked as overdefined. Most instruction handlers bail out early if the
instruction is already overdefined. But that is not necessary for
CastInsts for example. By skipping values that depend on skipped
calls, we resolve the crashes and also improve the precision in some
cases (see resolvedundefsin-tracked-fn.ll).
Note that we may not skip PHI nodes that may depend on a skipped call,
but they can be safely marked as overdefined, as we bail out early if
the PHI node is overdefined.
This reverts the revert commit
a74b31a3e9cd844c7ce2087978568e3f5ec8519.
This causes a crash for the reproducer below
enum { a };
enum b { c, d };
e;
static _Bool g(struct f *h, enum b i) {
i &&j();
return a;
}
static k(char h, enum b i) {
_Bool l = g(e, i);
l;
}
m(h) {
k(h, c);
g(h, d);
}
This reverts commit aadb635e04.
This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.
This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.
It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.
I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0%
test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5%
test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9%
test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0%
test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0%
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0%
test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0%
test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0%
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61314
We currently use integer ranges to merge concrete function arguments.
We use the ParamState range for those, but we only look up concrete
values in the regular state. For concrete function arguments that are
themselves arguments of the containing function, we can use the param
state directly and improve the precision in some cases.
Besides improving the results in some cases, this is also a small step towards
switching to ValueLatticeElement, by allowing D60582 to be a NFC.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71836
We have some code marks instructions with struct operands as overdefined,
but if the instruction is a call to a function with tracked arguments,
this breaks the assumption that the lattice values of all call sites
are not overdefined and will be replaced by a constant.
This also re-adds the assertion from D65222, with additionally skipping
non-callsite uses. This patch should address the cases reported in which
the assertion fired.
Fixes PR42738.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D65439
llvm-svn: 367430
Currently there are a few pointer comparisons in ValueDFS_Compare, which
can cause non-deterministic ordering when materializing values. There
are 2 cases this patch fixes:
1. Order defs before uses used to compare pointers, which guarantees
defs before uses, but causes non-deterministic ordering between 2
uses or 2 defs, depending on the allocation order. By converting the
pointers to booleans, we can circumvent that problem.
2. comparePHIRelated was comparing the basic block pointers of edges,
which also results in a non-deterministic order and is also not
really meaningful for ordering. By ordering by their destination DFS
numbers we guarantee a deterministic order.
For the example below, we can end up with 2 different uselist orderings,
when running `opt -mem2reg -ipsccp` hundreds of times. Because the
non-determinism is caused by allocation ordering, we cannot reproduce it
with ipsccp alone.
declare i32 @hoge() local_unnamed_addr #0
define dso_local i32 @ham(i8* %arg, i8* %arg1) #0 {
bb:
%tmp = alloca i32
%tmp2 = alloca i32, align 4
br label %bb19
bb4: ; preds = %bb20
br label %bb6
bb6: ; preds = %bb4
%tmp7 = call i32 @hoge()
store i32 %tmp7, i32* %tmp
%tmp8 = load i32, i32* %tmp
%tmp9 = icmp eq i32 %tmp8, 912730082
%tmp10 = load i32, i32* %tmp
br i1 %tmp9, label %bb11, label %bb16
bb11: ; preds = %bb6
unreachable
bb13: ; preds = %bb20
br label %bb14
bb14: ; preds = %bb13
%tmp15 = load i32, i32* %tmp
br label %bb16
bb16: ; preds = %bb14, %bb6
%tmp17 = phi i32 [ %tmp10, %bb6 ], [ 0, %bb14 ]
br label %bb19
bb18: ; preds = %bb20
unreachable
bb19: ; preds = %bb16, %bb
br label %bb20
bb20: ; preds = %bb19
indirectbr i8* null, [label %bb4, label %bb13, label %bb18]
}
Reviewers: davide, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D64866
llvm-svn: 367049