Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.
Author: sidbav (Sidharth Baveja)
Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)
Reviewed By: Meinersbur (Michael Kruse)
Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80580
Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.
Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri, nikic
Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60413
The block front may be a PHI node, inserting a cast instructions like
BitCast, PtrToInt, IntToPtr among PHIs is not right.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D80975
Summary:
The test case started to hoist bitcasts to upper BB after D81730.
Reverted unintentional logic change. Some instructions may have zero cost but
will not be hoisted by different limitation so should be counted for threshold.
Reviewers: aprantl, arsenm, nhaehnle
Reviewed By: aprantl
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82761
Currently SCCP does not combine the information of conditions joined by
AND in the true branch or OR in the false branch.
For branches on AND, 2 copies will be inserted for the true branch, with
one being the operand of the other as in the code below. We can combine
the information using intersection. Note that for the OR case, the
copies are inserted in the false branch, where using intersection is
safe as well.
define void @foo(i32 %a) {
entry:
%lt = icmp ult i32 %a, 100
%gt = icmp ugt i32 %a, 20
%and = and i1 %lt, %gt
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] }
%a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a)
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] }
%a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0)
br i1 %and, label %true, label %false
true: ; preds = %entry
call void @use(i32 %a.1)
%true.1 = icmp ne i32 %a.1, 20
call void @use.i1(i1 %true.1)
ret void
false: ; preds = %entry
call void @use(i32 %a.1)
ret void
}
Reviewers: efriedma, davide, mssimpso, nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D77808
Summary:
Almost all uses of these iterators, including implicit ones, really
only need the const variant (as it should be). The only exception is
in NewGVN, which changes the order of dominator tree child nodes.
Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4
Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik
Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits
Tags: #clang, #mlir, #llvm
Differential Revision: https://reviews.llvm.org/D83087
This patch adds support for eliminating stores by free & lifetime.end
calls. We can remove stores that are not read before calling a memory
terminator and we can eliminate all stores after a memory terminator
until we see a new lifetime.start. The second case seems to not really
trigger much in practice though.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72410
When all else fails, use range metadata to constrain the result
of loads and calls. It should also be possible to use !nonnull,
but that would require some general support for inequalities in
SCCP first.
Differential Revision: https://reviews.llvm.org/D83179
Take assume predicates into account when visiting ssa.copy. The
handling is the same as for branch predicates, with the difference
that we're always on the true edge.
Differential Revision: https://reviews.llvm.org/D83257
Summary: This patch makes code motion checks optional which are dependent on
specific analysis example, dominator tree, post dominator tree and dependence
info. The aim is to make the adoption of CodeMoverUtils easier for clients that
don't use analysis which were strictly required by CodeMoverUtils. This will
also help in diversifying code motion checks using other analysis example MSSA.
Authored By: RithikSharma
Reviewer: Whitney, bmahjour, etiotto
Reviewed By: Whitney
Subscribers: Prazek, hiraditya, george.burgess.iv, asbirlea, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D82566
The (previously-crashing) test-case would cause us to seemingly-harmlessly
replace some use with something else, but we can't replace it with itself,
so we would crash.
As reported in https://reviews.llvm.org/D83101#2133062
the new visitInsertElementInst()/visitExtractElementInst() functionality
is causing miscompiles (previously-crashing test added)
It is due to the fact how the infra of Scalarizer is dealing with DCE,
it was not updated or was it ready for such scalar value forwarding.
It always assumed that the moment we "scalarized" something,
it can go away, and did so with prejudice.
But that is no longer safe/okay to do.
Instead, let's prevent it from ever shooting itself into foot,
and let's just accumulate the instructions-to-be-deleted
in a vector, and collectively cleanup (those that are *actually* dead)
them all at the end.
All existing tests are not reporting any new garbage leftovers,
but maybe it's test coverage issue.
Summary:
Avoid exposing details about how roots are stored. This enables subsequent
type-erasure changes.
v5:
- cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE
Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83085
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.
New methods are introduced to cover common access patterns.
Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83083
Compilers may evaluate call arguments in different order,
which would result in different order of IR, which would break the tests.
Spotted thanks to Dmitri Gribenko!
Summary:
I'm interested in taking the original C++ input,
for which we currently are stuck with an alloca
and producing roughly the lower IR,
with neither an alloca nor a vector ops:
https://godbolt.org/z/cRRWaJ
For that, as intermediate step, i'd to somehow perform scalarization.
As per @arsenmn suggestion, i'm trying to see if scalarizer can help me
avoid writing a bicycle.
I'm not sure if it's really intentional that variable insert is not handled currently.
If it really is, and is supposed to stay that way (?), i guess i could guard it..
See [[ https://bugs.llvm.org/show_bug.cgi?id=46524 | PR46524 ]].
Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert
Reviewed By: jdoerfert
Subscribers: arphaman, uabelho, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82961
Summary:
It appears to be better IR-wise to aggressively scalarize it,
rather than relying on gathering it, and leaving it as-is.
Reviewers: jdoerfert, bjope, arsenm, cameron.mcinally
Reviewed By: jdoerfert
Subscribers: arphaman, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83101
Summary: As it can be clearly seen from the diff, this results in nicer IR.
Reviewers: jdoerfert, arsenm, bjope, cameron.mcinally
Reviewed By: jdoerfert
Subscribers: arphaman, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83102
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
This emits a remark when LoopDeletion deletes a dead loop, using the
source location of the loop's header. There are currently two reasons
for removing the loop: invariant loop or loop that never executes.
Differential Revision: https://reviews.llvm.org/D83113
LowerConstantIntrinsics fails to preserve the analysis result of
GlobalsAA. Not preserving the analysis might affect benchmark
performance. This change fixes this issue.
Patch by Ryan Santhiraraja <rsanthir@quicinc.com>
Reviewers: fpetrogalli, joerg, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D82342
Summary:
- `ptrtoint` and `inttoptr` are defined as no-op casts if the integer
value as the same size as the pointer value. The pair of
`ptrtoint`/`inttoptr` is in fact a no-op cast sequence between
different address spaces. Teach `infer-address-spaces` to handle them
like a `bitcast`.
Reviewers: arsenm, chandlerc
Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81938
For passes got skipped, this is confusing because the log said it is `running pass`
but it is skipped later.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82511
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Provided test case crashes otherwise.
If NewTy is already DL.getIntPtrType(NewTy),
CreateBitCast() won't actually create any bitcast,
so we are better off just doing the general thing.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.
Reviewers: samparker
Reviewed By: samparker
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82383
This patch add support for eliminating MemoryDefs that do not have any
aliasing users, which indicates that there are no reads/writes to the
memory location until the end of the function.
To eliminate such defs, we have to ensure that the underlying object is
not visible in the caller and does not escape via returning. We need a
separate check for that, as InvisibleToCaller does not consider returns.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker, george.burgess.iv
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72631
This patch extends storeIsNoop to also detect stores of 0 to an calloced
object. This basically ports the logic from legacy DSE to the MemorySSA
backed version.
It triggers in a few cases on MultiSource, SPEC2000, SPEC2006 with -O3
LTO:
Same hash: 218 (filtered out)
Remaining: 19
Metric: dse.NumNoopStores
Program base patch2 diff
test-suite...CFP2000/177.mesa/177.mesa.test 1.00 15.00 1400.0%
test-suite...6/482.sphinx3/482.sphinx3.test 1.00 14.00 1300.0%
test-suite...lications/ClamAV/clamscan.test 2.00 28.00 1300.0%
test-suite...CFP2006/433.milc/433.milc.test 1.00 8.00 700.0%
test-suite...pplications/oggenc/oggenc.test 2.00 9.00 350.0%
test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test NaN 137.00 nan%
test-suite...libquantum/462.libquantum.test NaN 3.00 nan%
test-suite...6/464.h264ref/464.h264ref.test NaN 7.00 nan%
test-suite...decode/alacconvert-decode.test NaN 2.00 nan%
test-suite...encode/alacconvert-encode.test NaN 2.00 nan%
test-suite...ications/JM/ldecod/ldecod.test NaN 9.00 nan%
test-suite...ications/JM/lencod/lencod.test NaN 39.00 nan%
test-suite.../Applications/lemon/lemon.test NaN 2.00 nan%
test-suite...pplications/treecc/treecc.test NaN 4.00 nan%
test-suite...hmarks/McCat/08-main/main.test NaN 4.00 nan%
test-suite...nsumer-lame/consumer-lame.test NaN 3.00 nan%
test-suite.../Prolangs-C/bison/mybison.test NaN 1.00 nan%
test-suite...arks/mafft/pairlocalalign.test NaN 30.00 nan%
Reviewers: efriedma, zoecarver, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82204
This updates the MemorySSA backed implementation to treat arguments
passed by value similar to allocas: in they are assumed to be invisible
in the caller. This is similar to how they are treated in legacy DSE.
Reviewers: efriedma, asbirlea, george.burgess.iv
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D82222
Summary:
- When promoting a pointer from memory to register, SROA skips pointers
from different address spaces. However, as `ptrtoint` and `inttoptr`
are defined as no-op casts if that integer type has the same as the
pointer value, generate the pair of `ptrtoint`/`inttoptr` (no-op cast)
sequence to convert pointers from different address spaces if they
have the same size.
Reviewers: arsenm, chandlerc, lebedev.ri
Subscribers:
Differential Revision: https://reviews.llvm.org/D81943
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
Currently we stop exploring candidates too early in some cases.
In particular, we can continue checking the defining accesses of
non-removable MemoryDefs and defs without analyzable write location
(read clobbers are already ruled out using MemorySSA at this point).
As we traverse the CFG backwards, we could end up reaching unreachable
blocks. For unreachable blocks, we won't have computed post order
numbers and because DomAccess is reachable, unreachable blocks cannot be
on any path from it.
This fixes a crash with unreachable blocks.
This patch updates SCCP/IPSCCP to use the computed range info to turn
sexts into zexts, if the value is known to be non-negative. We already
to a similar transform in CorrelatedValuePropagation, but it seems like
we can catch a lot of additional cases by doing it in SCCP/IPSCCP as
well.
The transform is limited to ranges that are known to not include undef.
Currently constant ranges from conditions are treated as potentially
containing undef, due to PR46144. Once we flip this, the transform will
be more effective in practice.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81756
Move code that may update the IR after precondition, so that if precondition
fail, the IR isn't modified.
Differential Revision: https://reviews.llvm.org/D81225
This patch updates LowerMatrixIntrinsics to preserve the alignment
specified at the original load/stores and the align attribute for the
pointer argument of the column.major.load/store intrinsics.
We can always use the specified alignment for the load of the first
column. For subsequent columns, the alignment may need to be reduced.
For ConstantInt strides, compute the offset for the start of the column in
bytes and use commonAlignment to get the largest valid alignment.
For non-ConstantInt strides, we need to take the common alignment of the
initial alignment and the element size in bytes.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D81960
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D81498
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
In more complicated loops we can easily hit the complexity limits of
loop strength reduction. If we do and filtering occurs, it's all too
easy to remove the wrong formulae for post-inc preferring accesses due
to it attempting to maximise register re-use. The patch adds an
alternative filtering step when the target is preferring postinc to pick
postinc formulae instead, hopefully lowering the complexity to below the
limit so that aggressive filtering is not needed.
There is also a change in here to stop considering existing addrecs as
free under postinc. We should already be modelling them as a reg so
don't want it to cause us to get the cost wrong. (I'm not sure that code
makes sense in general, but there are X86 tests specifically for it
where it seems to be helping so have left it around for the standard
non-post-inc case).
Differential Revision: https://reviews.llvm.org/D80273
Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
This patch adds a new option to CriticalEdgeSplittingOptions to control
whether loop-simplify form must be preserved. It is them used by GVN to
indicate that loop-simplify form does not have to be preserved.
This fixes a crash exposed by 189efe295b.
If the critical edge we are splitting goes from a block inside a loop to
a block outside the loop, splitting the edge will create a new exit
block. As a result, the new block will branch to the original exit
block, which will add a non-loop predecessor, breaking loop-simplify
form. To preserve loop-simplify form, the predecessor blocks of the
original exit are split, but that does not work for blocks with
indirectbr terminators. If preserving loop-simplify form is requested,
bail out , before making any changes.
Reviewers: reames, hfinkel, davide, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81582
Previously these functions either returned a "changed" flag or a "repeat
instruction" flag, and could also modify an iterator to control which
instruction would be processed next.
Simplify this by always returning a "changed" flag, and handling all of
the "repeat instruction" functionality by modifying the iterator.
No functional change intended except in this case:
// If the source and destination of the memcpy are the same, then zap it.
... where the previous code failed to process the instruction after the
zapped memcpy.
Differential Revision: https://reviews.llvm.org/D81540
This patch relaxes the post-dominance requirement for accesses to
objects visible after the function returns.
Instead of requiring the killing def to post-dominate the access to
eliminate, the set of 'killing blocks' (= blocks that completely
overwrite the original access) is collected.
If all paths from the access to eliminate and an exit block go through a
killing block, the access can be removed.
To check this property, we first get the common post-dominator block for
the killing blocks. If this block does not post-dominate the access
block, there may be a path from DomAccess to an exit block not involving
any killing block.
Otherwise we have to check if there is a path from the DomAccess to the
common post-dominator, that does not contain a killing block. If there
is no such path, we can remove DomAccess. For this check, we start at
the common post-dominator and then traverse the CFG backwards. Paths are
terminated when we hit a killing block or a block that is not executed
between DomAccess and a killing block according to the post-order
numbering (if the post order number of a block is greater than the one
of DomAccess, the block cannot be in in a path starting at DomAccess).
This gives the following improvements on the total number of stores
after DSE for MultiSource, SPEC2K, SPEC2006:
Tests: 237
Same hash: 206 (filtered out)
Remaining: 31
Metric: dse.NumRemainingStores
Program base new100 diff
test-suite...CFP2000/188.ammp/188.ammp.test 3624.00 3544.00 -2.2%
test-suite...ch/g721/g721encode/encode.test 128.00 126.00 -1.6%
test-suite.../Benchmarks/Olden/mst/mst.test 73.00 72.00 -1.4%
test-suite...CFP2006/433.milc/433.milc.test 3202.00 3163.00 -1.2%
test-suite...000/186.crafty/186.crafty.test 5062.00 5010.00 -1.0%
test-suite...-typeset/consumer-typeset.test 40460.00 40248.00 -0.5%
test-suite...Source/Benchmarks/sim/sim.test 642.00 639.00 -0.5%
test-suite...nchmarks/McCat/09-vor/vor.test 642.00 644.00 0.3%
test-suite...lications/sqlite3/sqlite3.test 35664.00 35563.00 -0.3%
test-suite...T2000/300.twolf/300.twolf.test 7202.00 7184.00 -0.2%
test-suite...lications/ClamAV/clamscan.test 19475.00 19444.00 -0.2%
test-suite...INT2000/164.gzip/164.gzip.test 2199.00 2196.00 -0.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 2380.00 2378.00 -0.1%
test-suite.../Benchmarks/Bullet/bullet.test 39335.00 39309.00 -0.1%
test-suite...:: External/Povray/povray.test 36951.00 36927.00 -0.1%
test-suite...marks/7zip/7zip-benchmark.test 67396.00 67356.00 -0.1%
test-suite...6/464.h264ref/464.h264ref.test 31497.00 31481.00 -0.1%
test-suite...006/453.povray/453.povray.test 51441.00 51416.00 -0.0%
test-suite...T2006/401.bzip2/401.bzip2.test 4450.00 4448.00 -0.0%
test-suite...Applications/kimwitu++/kc.test 23481.00 23471.00 -0.0%
test-suite...chmarks/MallocBench/gs/gs.test 6286.00 6284.00 -0.0%
test-suite.../CINT2000/254.gap/254.gap.test 13719.00 13715.00 -0.0%
test-suite.../Applications/SPASS/SPASS.test 30345.00 30338.00 -0.0%
test-suite...006/450.soplex/450.soplex.test 15018.00 15016.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 27780.00 27777.00 -0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 105285.00 105276.00 -0.0%
There might be potential to pre-compute some of the information of which
blocks are on the path to an exit for each block, but the overall
benefit might be comparatively small.
On the set of benchmarks, 15738 times out of 20322 we reach the
CFG check, the CFG check is successful. The total number of iterations
in the CFG check is 187810, so on average we need less than 10 steps in
the check loop. Bumping the threshold in the loop from 50 to 150 gives a
few small improvements, but I don't think they warrant such a big bump
at the moment. This is all pending further tuning in the future.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: george.burgess.iv
Differential Revision: https://reviews.llvm.org/D78932
blocks.
Summary: The current LoopFusion forget to update the incoming block of
the phis in second loop guard non loop successor from second loop guard
block to first loop guard block. A test case is provided to better
understand the problem.
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81421
Slice::operator<() has a non-deterministic behavior. If we have
identical slices comparison will depend on the order or operands.
Normally that does not result in unstable compilation results
because the order in which slices are inserted into the vector
is deterministic and llvm::sort() normally behaves as a stable
sort, although that is not guaranteed.
However, there is test option -sroa-random-shuffle-slices which
is used to check exactly this aspect. The vector is first randomly
shuffled and then sorted. The same shuffling happens without this
option under expensive llvm checks.
I have managed to write a test which has hit this problem.
There are no fields in the Slice class to resolve the instability.
We only have offsets, IsSplittable and Use, but neither Use nor
User have anything suitable for predictable comparison.
I have switched to stable_sort which has to be sufficient and
removed that randon shuffle option.
Differential Revision: https://reviews.llvm.org/D81310
- Now all SalvageDebugInfo() calls will mark undef if the salvage
attempt fails.
Reviewed by: vsk, Orlando
Differential Revision: https://reviews.llvm.org/D78369
If an instruction is erased we also need to remove it from
Visited set. There is a very small chance that an another
newly created instruction will be created with the same
pointer value in place of an erased one.
Differential Revision: https://reviews.llvm.org/D80958
Now that we have an operand based form for the GC arguments to a statepoint intrinsic, update RS4GC to use it and update tests to reflect. This is pretty straight forward. I nearly landed without review, but figured a second set of eyes didn't hurt.
Differential Revision: https://reviews.llvm.org/D81121
Remove the requirement, that when performing accumulator elimination,
all other cases must return the same dynamic constant. We can do this by
initializing the accumulator with the identity value of the accumulation
operation, and inserting an additional operation before any return.
Differential Revision: https://reviews.llvm.org/D80844
Summary:
This patch simplifies FindMostPopularDest without changing the
functionality.
Given a list of jump threading destinations, the function finds the
most popular destination. To ensure determinism when there are
multiple destinations with the highest popularity, the function picks
the first one in the successor list with the highest popularity.
Without this patch:
- The function populates DestPopularity -- a histogram mapping
destinations to their respective occurrence counts.
- Then we iterate over DestPopularity, looking for the highest
popularity while building a vector of destinations with the highest
popularity.
- Finally, we iterate the successor list, looking for the destination
with the highest popularity.
With this patch:
- We implement DestPopularity with MapVector instead of DenseMap. We
populate the map with popularity 0 for all successors in the order
they appear in the successor list.
- We build the histogram in the same way as before.
- We simply use std::max_element on DestPopularity to find the most
popular destination. The use of MapVector ensures determinism.
Reviewers: wmi, efriedma
Reviewed By: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81030
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
Currently SCCP does not widen PHIs, stores or along call edges
(arguments/return values), but on operations that directly extend ranges
(like binary operators).
This means PHIs, stores and call edges are not pessimized by widening
currently, while binary operators are. The main reason for widening
operators initially was that opting-out for certain operations was
more straight-forward in the initial implementation (and it did not
matter too much, as range support initially was only implemented for a
very limited set of operations.
During the discussion in D78391, it was suggested to consider flipping
widening to PHIs, stores and along call edges. After adding support for
tracking the number of range extensions in ValueLattice, limiting the
number of range extensions per value is straight forward.
This patch introduces a MaxWidenSteps option to the MergeOptions,
limiting the number of range extensions per value. For PHIs, it seems
natural allow an extension for each (active) incoming value plus 1. For
the other cases, a arbitrary limit of 10 has been chosen initially. It would
potentially make sense to set it depending on the users of a
function/global, but that still needs investigating. This potentially
leads to more state-changes and longer compile-times.
The results look quite promising (MultiSource, SPEC):
Same hash: 179 (filtered out)
Remaining: 58
Metric: sccp.IPNumInstRemoved
Program base widen-phi diff
test-suite...ks/Prolangs-C/agrep/agrep.test 58.00 82.00 41.4%
test-suite...marks/SciMark2-C/scimark2.test 32.00 43.00 34.4%
test-suite...rks/FreeBench/mason/mason.test 6.00 8.00 33.3%
test-suite...langs-C/football/football.test 104.00 128.00 23.1%
test-suite...cations/hexxagon/hexxagon.test 36.00 42.00 16.7%
test-suite...CFP2000/177.mesa/177.mesa.test 214.00 249.00 16.4%
test-suite...ngs-C/assembler/assembler.test 14.00 16.00 14.3%
test-suite...arks/VersaBench/dbms/dbms.test 10.00 11.00 10.0%
test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 47.00 9.3%
test-suite...ications/JM/ldecod/ldecod.test 179.00 195.00 8.9%
test-suite...CFP2006/433.milc/433.milc.test 249.00 265.00 6.4%
test-suite.../CINT2000/175.vpr/175.vpr.test 98.00 104.00 6.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 74.00 5.7%
test-suite...CFP2000/188.ammp/188.ammp.test 71.00 75.00 5.6%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 111.00 117.00 5.4%
test-suite...ce/Applications/Burg/burg.test 41.00 43.00 4.9%
test-suite...000/197.parser/197.parser.test 66.00 69.00 4.5%
test-suite...tions/lambda-0.1.3/lambda.test 23.00 24.00 4.3%
test-suite...urce/Applications/lua/lua.test 301.00 313.00 4.0%
test-suite...TimberWolfMC/timberwolfmc.test 76.00 79.00 3.9%
test-suite...lications/ClamAV/clamscan.test 991.00 1030.00 3.9%
test-suite...plications/d/make_dparser.test 53.00 55.00 3.8%
test-suite...fice-ispell/office-ispell.test 83.00 86.00 3.6%
test-suite...lications/obsequi/Obsequi.test 28.00 29.00 3.6%
test-suite.../Prolangs-C/bison/mybison.test 56.00 58.00 3.6%
test-suite.../CINT2000/254.gap/254.gap.test 170.00 176.00 3.5%
test-suite.../Applications/lemon/lemon.test 30.00 31.00 3.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1202.00 1240.00 3.2%
test-suite...pplications/treecc/treecc.test 79.00 81.00 2.5%
test-suite...chmarks/MallocBench/gs/gs.test 357.00 366.00 2.5%
test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9%
test-suite...T2006/445.gobmk/445.gobmk.test 1697.00 1724.00 1.6%
test-suite...006/453.povray/453.povray.test 1812.00 1839.00 1.5%
test-suite.../Benchmarks/Bullet/bullet.test 337.00 342.00 1.5%
test-suite.../CINT2000/252.eon/252.eon.test 426.00 432.00 1.4%
test-suite...T2000/300.twolf/300.twolf.test 214.00 217.00 1.4%
test-suite...pplications/oggenc/oggenc.test 244.00 247.00 1.2%
test-suite.../CINT2006/403.gcc/403.gcc.test 4008.00 4055.00 1.2%
test-suite...T2006/456.hmmer/456.hmmer.test 175.00 177.00 1.1%
test-suite...nal/skidmarks10/skidmarks.test 430.00 434.00 0.9%
test-suite.../Applications/sgefa/sgefa.test 115.00 116.00 0.9%
test-suite...006/447.dealII/447.dealII.test 1082.00 1091.00 0.8%
test-suite...6/482.sphinx3/482.sphinx3.test 141.00 142.00 0.7%
test-suite...ocBench/espresso/espresso.test 152.00 153.00 0.7%
test-suite...3.xalancbmk/483.xalancbmk.test 4003.00 4025.00 0.5%
test-suite...lications/sqlite3/sqlite3.test 548.00 551.00 0.5%
test-suite...marks/7zip/7zip-benchmark.test 5522.00 5551.00 0.5%
test-suite...nsumer-lame/consumer-lame.test 208.00 209.00 0.5%
test-suite...:: External/Povray/povray.test 1556.00 1563.00 0.4%
test-suite...000/186.crafty/186.crafty.test 298.00 299.00 0.3%
test-suite.../Applications/SPASS/SPASS.test 2019.00 2025.00 0.3%
test-suite...ications/JM/lencod/lencod.test 8427.00 8449.00 0.3%
test-suite...6/464.h264ref/464.h264ref.test 6797.00 6813.00 0.2%
test-suite...6/471.omnetpp/471.omnetpp.test 431.00 430.00 -0.2%
test-suite...006/450.soplex/450.soplex.test 446.00 447.00 0.2%
test-suite...0.perlbench/400.perlbench.test 1729.00 1727.00 -0.1%
test-suite...000/255.vortex/255.vortex.test 3815.00 3819.00 0.1%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D79036
These are the two operand sets which are expected to survive more than another week or so. Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.
For those following along, this is likely to be the last (major) change in this sequence for about a week. I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces). Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
Summary:
Only column-major was supported so far. This adds row-major support as well.
Note that we probably also want very efficient SIMD implementations for the
various target platforms.
Bug:
https://bugs.llvm.org/show_bug.cgi?id=46085
Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80673
Continues from D80598.
The key point of the change is to default to using operand bundles instead of the inline length prefix argument lists for statepoint nodes. An important subtlety to note is that the presence of a bundle has semantic meaning, even if it is empty. As such, we need to make a somewhat deeper change to the interface than is first obvious.
Existing code treats statepoint deopt arguments and the deopt bundle operands differently during inlining. The former is ignored (resulting in caller state being dropped), the later is merged.
We can't preserve the old behaviour for calls with deopt fed to RS4GC and then inlining, but we can avoid the no-deopt case changing. At least in internal testing, that seem to be the important one. (I'd argue the "stop merging after RS4GC" behaviour for the former was always "unexpected", but that the behaviour for non-deopt calls actually make sense.)
Differential Revision: https://reviews.llvm.org/D80674
This one is slightly odd since it counts as an address expression,
which previously could never fail. Allow the existing TTI hook to
return the value to use, and re-use it for handling how to handle
ptrmask.
Handles the no-op addrspacecasts for AMDGPU. We could probably do
something better based on analysis of the mask value based on the
address space, but leave that for now.
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
Currently we can only eliminate call return pairs that either return the
result of the call or a dynamic constant. This patch removes that
limitation.
Differential Revision: https://reviews.llvm.org/D79660
Currently if instructions defined in a block are used in unreachable
blocks and SimpleLoopUnswitch attempts deleting the block, it triggers
assertion "Uses remain when a value is destroyed!".
This patch fixes it by replacing all uses of instructions from BB with
undefs before BB deletion.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D80551
Recommitting most of the remaining changes from
259eb619ff, but excluding the call to
getUserCost from getInstructionThroughput. Though there's still no
test changes, I doubt that this is an NFC...
With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. The remaining scalar
intrinsics were memcpy, cttz and ctlz which now have special handling
in the BasicTTI implementation.
This had required a change in the AMDGPU backend for fabs as it
should always be 'free'. I've also changed the X86 backend to return
the BaseT implementation when the CostKind isn't RecipThroughput.
Differential Revision: https://reviews.llvm.org/D80012
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. This involved sinking
cost of the TTIImpl into the base implementation, as it performs no
target checks. The opcodes remaining were memcpy, cttz and ctlz which
now have special handling in the BasicTTI implementation.
getInstructionThroughput can now directly return the result of
getUserCost.
This had required a change in the AMDGPU backend for fabs and its
always 'free'. I've also changed the X86 backend to return '1' for
any intrinsic when the CostKind isn't RecipThroughput.
Though this intended to be a non-functional change, there are many
paths being combined here so I would be very surprised if this didn't
have an effect.
Differential Revision: https://reviews.llvm.org/D80012
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.
Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.
Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.
Differential Revision: https://reviews.llvm.org/D76985
Now that load/store have required alignment, accept Align here.
This also avoids uses of getPointerElementType(), which is
incompatible with opaque pointers.
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
It's really almost going to be misleading, see the example in
https://bugs.llvm.org/show_bug.cgi?id=45820
Maybe at some point we can do something fancier, but at least
this will fix a bug where we step on dead code while debugging.
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
This is relanding of rGbb308b020522420413c7d3f2989a88f2fc423c56 after
speculatively fixing buildbot lit test failure which was seen on two
bots (I cannot reproduce the lit test failure locally either).
[RS4GC] Fix algorithm to avoid setting vector BDV for scalar derived
pointer
Summary:
This is a more general fix to 59029b9eef (D75704).
This patch does the following:
updates isKnownBaseValue to account for base pointer and
derived pointer having differing types.
This inturn allows us to populate the
lattice (States) for such derived pointers.
It also updates all states where the base and derived pointers have
differing types (vector versus scalar) and conservatively marks these
states as conflictcs.
Note that in 59029b9eef, we were just fixing existing lattice values
and that too, only for uses of extractelement.
Reviewers: reames, skatkov, dantrushin
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D76305
Summary:
This is a more general fix to 59029b9eef (D75704).
This patch does the following:
1. updates isKnownBaseValue to account for base pointer and
derived pointer having differing types.
2. This inturn allows us to populate the
lattice (States) for such derived pointers.
3. It also updates all states where the base and derived pointers have
differing types (vector versus scalar) and conservatively marks these
states as conflictcs.
Note that in 59029b9eef, we were just fixing existing lattice values
and that too, only for uses of extractelement.
Reviewers: reames, skatkov, dantrushin
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76305
Summary:
Analyses that are statefull should not be retrieved through a proxy from
an outer IR unit, as these analyses are only invalidated at the end of
the inner IR unit manager.
This patch disallows getting the outer manager and provides an API to
get a cached analysis through the proxy. If the analysis is not
stateless, the call to getCachedResult will assert.
Reviewers: chandlerc
Subscribers: mehdi_amini, eraman, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72893
Summary:
Update check to include the check for unreachable.
Basic blocks ending in unreachable are special cased, as these blocks may be already unswitched. Before this patch this check is only done for the default destination.
The condition for the exit cases and the default case must be the same, because we should never leave edges from the switch instruction to a basic block that we are unswitching. In PR45355 we still have a remaining edge (that we're attempting to remove from the DT) because its the default edge to an unreachable-terminated block where we unswitch a case edge to that block.
Resolves PR45355.
Reviewers: chandlerc
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78279
Use Align instead of using MaybeAlign; all the operations in question
have known alignment.
For getSliceAlign() in particular, in the cases where we used to return
None, it would be converted back to an Align by IRBuilder, so there's no
functional change there.
Split off from D77454.
Differential Revision: https://reviews.llvm.org/D79205
This patch adds a new TTI hook to allow targets to tell LSR that
a chain including some instruction is already profitable and
should not be optimized. This patch also adds an implementation
of this TTI hook for ARM so LSR doesn't optimize chains that include
the VCTP intrinsic.
Differential Revision: https://reviews.llvm.org/D79418
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
Hide the method that allows setting probability for particular
edge and introduce a public method that sets probabilities for
all outgoing edges at once.
Setting individual edge probability is error prone. More over
it is difficult to check that the total probability is 1.0
because there is no easy way to know when the user finished
setting all the probabilities.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
Fixes PR41696
The loop-reroll pass generates an invalid IR (or its assertion
fails in debug build) if values of the base instruction and
other root instructions (terms used in the loop-reroll pass)
are used outside the loop block. See IRs written in PR41696
as examples.
The current implementation of the loop-reroll pass can reroll
only loops that don't have values that are used outside the
loop, except reduced values (the last values of reduction chains).
This is described in the comment of the `LoopReroll::reroll`
function.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1600
This is checked in the `LoopReroll::DAGRootTracker::validate`
function.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1393
However, the base instruction and other root instructions skip
this check in the validation loop.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1229
Moving the check in front of the skip is the logically simplest
fix. However, inserting the check in an earlier stage is better
in terms of compilation time of unrerollable loops. This fix
inserts the check for the base instruction into the function
to validate possible base/root instructions. Check for other
root instructions is unnecessary because they don't match any
base instructions if they have uses outside the loop.
Differential Revision: https://reviews.llvm.org/D79549
Separate functions that require shared state into a class to avoid
needing to pass them though multiple functions just to be available
where needed.
The main motivation for this is that we would like to remove the
limitation that accumulator values be dynamic constant, which would
require additional shared state between call eliminations in the same
function, compounding this issue.
Differential Revision: https://reviews.llvm.org/D79299
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.
Reviewed By: kbarton
Differential Revision: https://reviews.llvm.org/D78679
Summary:
Update the check for the default exit block to not only check that the
terminator is not unreachable, but also check that unreachable block has
*only* the unreachable instruction.
Reviewers: chandlerc
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78277
loop nest.
Summary: As discussed in https://reviews.llvm.org/D73129.
Example
Before unroll and jam:
for
A
for
B
for
C
D
E
After unroll and jam (currently):
for
A
A'
for
B
for
C
D
B'
for
C'
D'
E
E'
After unroll and jam (Ideal):
for
A
A'
for
B
B'
for
C
C'
D
D'
E
E'
This is the first patch to change unroll and jam to work in the ideal
way.
This patch change the safety checks needed to make sure is safe to
unroll and jam in the ideal way.
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: Meinersbur
Subscribers: fhahn, hiraditya, zzheng, llvm-commits, anhtuyen, prithayan
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D76132
LSR has some logic that tries to aggressively reuse registers in
formula. This can lead to sub-optimal decision in complex loops where
the backend it trying to use shouldFavorPostInc. This disables the
re-use in those situations.
Differential Revision: https://reviews.llvm.org/D79301
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
This fixes potential reference invalidations, when no lattice value is
assigned for CopyOf. As the state of CopyOf won't change while in
handleCallResult, we can get a copy once and use that.
Should fix PR45749.
Summary: There is no need to create BPI explicitly. It should be requested through AM in a normal way.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79080
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
This allows forward declarations of PointerCheck, which in turn reduce
the number of times LoopAccessAnalysis needs to be included.
Ultimately this helps with moving runtime check generation to
Transforms/Utils/LoopUtils.h, without having to include it there.
Reviewers: anemet, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78458
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78433
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.
For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.
The new stat provides the absolute number of stores surviving after
DSE.
Reviewers: dmgreen, bryant, asbirlea, jfb
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D78830
This patch slightly improves the formatting of the debug output, adds a
few missing outputs and makes some existing outputs more consistent with
the rest.
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
visitExtractValueInst uses mergeInValue, so it already can handle
constant ranges. Initially the early exit was using isOverdefined to
keep things as NFC during the initial move to ValueLatticeElement.
As the function already supports constant ranges, it can just use
ValueState[&I].isOverdefined.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78393
Use Instruction::comesBefore() instead of OrderedInstructions
inside InstructionPrecedenceTracking. This also removes the
dominator tree dependency.
Differential Revision: https://reviews.llvm.org/D78461
Summary:
The indexing operator in Scatterer may result in building new
instructions. When using multiple such operators in a function
argument list the order in which we build instructions depend on
argument evaluation order (which is undefined in C++).
This patch avoid such problems by expanding the components using
the [] operator prior to the function call.
Problem was seen when comparing output, while builing LLVM with
different compilers (clang vs gcc).
Reviewers: foad, cameron.mcinally, uabelho
Reviewed By: foad
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78455
Some includes are not required and forward declarations can be used
instead. This also exposed a few places that were not directly including
required files.
This makes it easier to extend the merge options in the future and also
reduces the risk of accidentally setting a wrong option.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78368
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be a little cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78345
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78345
Users of ValueLatticeElement currently have to ensure constant ranges
are not extended indefinitely. For example, in SCCP, mergeIn goes to
overdefined if a constantrange value is repeatedly merged with larger
constantranges. This is a simple form of widening.
In some cases, this leads to an unnecessary loss of information and
things can be improved by allowing a small number of extensions in the
hope that a fixed point is reached after a small number of steps.
To make better decisions about widening, it is helpful to keep track of
the number of range extensions. That state is tied directly to a
concrete ValueLatticeElement and some unused bits in the class can be
used. The current patch preserves the existing behavior by default:
CheckWiden defaults to false and if CheckWiden is true, a single change
to the range is allowed.
Follow-up patches will slightly increase the threshold for widening.
Reviewers: efriedma, davide, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78145
The Float2IntPass got a class member called Roots, but Roots
was also passed around to member function as a reference. This
patch simply remove those references.
Now Reassociate Pass invalidates the analysis results of AAManager and BasicAA,
but it saves GlobalsAA, although it seems that it should preserve them, since
it affects only Unary and Binary operators.
Author: kpolushin (Kirill)
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D77137
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.
This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.
<rdar://problem/61750950>
We can eliminate MemoryDefs of objects not accessible after the function
returns (e.g. alloca), if there are no reads between the MemoryDef and
any function exits. We can stop traversing paths that completely
overwrite the memory location of the MemoryDef.
This patch was split off D73763.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: asbirlea, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D77736