This patch adds a basic support for freeze instruction to JumpThreading
by making ComputeValueKnownInPredecessorsImpl look into its operand.
Reviewed By: efriedma, nikic
Differential Revision: https://reviews.llvm.org/D84598
While this doesn't appear to help with the perf issue being exposed by
D84108, the function as-is is very weird, convoluted, and what's worse,
recursive.
There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`,
tri-state choice is enough. We don't even ever check for that state.
The basic idea here is that we need to perform a depth-first traversal
of the predecessors of the basic block in question, either finding a
preexisting state for the block in a map, or inserting a "placeholder"
`SpeculativelyAvaliable`,
If we encounter an `Unavaliable` block, then we need to give up search,
and back-propagate the `Unavaliable` state to the each successor of
said block, more specifically to the each `SpeculativelyAvaliable`
we've just created.
However, if we have traversed entirety of the predecessors and have not
encountered an `Unavaliable` block, then it must mean the value is fully
available. We could update each inserted `SpeculativelyAvaliable` into
a `Avaliable`, but we don't need to, as assertion excersizes,
because we can assume that if we see an `SpeculativelyAvaliable` entry,
it is actually `Avaliable`, because during the time we've produced it,
if we would have found that it has an `Unavaliable` predecessor,
we would have updated it's successors, including this block,
into `Unavaliable`
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D84181
SplitBlockPredecessors() can not split blocks that have such terminators,
and in two other places we already ensure that we don't end up calling
SplitBlockPredecessors() on such blocks. Do so in one more place.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46857
Reapply with DTU update moved after CFG update, which is a
requirement of the API.
-----
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)
Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.
Differential Revision: https://reviews.llvm.org/D84264
This patch updates IPSCCP to drop argmemonly and
inaccessiblemem_or_argmemonly if it replaces a pointer argument.
Fixes PR46717.
Reviewers: efriedma, davide, nikic, jdoerfert
Reviewed By: efriedma, jdoerfert
Differential Revision: https://reviews.llvm.org/D84432
This is the second of two patches to address PR46753. We basically allow
SROA to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The (transitive) uses are replaced by
`undef` in the droppable instructions.
See also D83976.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83978
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
It breaks stage-2 build. Clang crashed when compiling
llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp
llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.
Here is a simple scenario peeling can be used in loop fusion:
for (i = 0; i < 10; ++i)
a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
b[j] = b[j] + 5;
Here is we can make use of peeling, and then fuse the two loops together. We
can peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.
a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
a[i] = a[i] + 3;
b[i] = b[i] + 5;
}
Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.
Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D82927
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)
Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.
Differential Revision: https://reviews.llvm.org/D84264
This patch clarifies the failing point of having input or output vectors
of differing types. Before, lowering would fail elsewhere (e.g. in
`fmul` creation) which may have been not immediately clear.
As a side effect, the `getElementType` and `getVectoryTy` functions
required the `const` qualifier to be added.
Reviewers: fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D84374
This reverts commit bb8850d34d.
It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build.
LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.
Summary:
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.
Here is a simple scenario peeling can be used in loop fusion:
for (i = 0; i < 10; ++i)
a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
b[j] = b[j] + 5;
Here is we can make use of peeling, and then fuse the two loops together. We can
peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.
a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
a[i] = a[i] + 3;
b[i] = b[i] + 5;
}
Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.
Author: sidbav (Sidharth Baveja)
Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour
Reviewed By: bmahjour
Subscribers: bmahjour, mgorny, hiraditya, zzheng
Tags: LLVM
Differential Revision: https://reviews.llvm.org/D82927
If we inferred a range for the function return value, we can add !range
at all call-sites of the function, if the range does not include undef.
Reviewers: efriedma, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83952
This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.
Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D81308
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.
At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.
Reviewers: anemet, Gerolf, efriedma, hfinkel
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76867
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.
D84108 will hopefully make hoisting off-by-default too.
Both users of predicteinfo (NewGVN and SCCP) are interested in
getting a cmp constraint on the predicated value. They currently
implement separate logic for this. This patch adds a common method
for this in PredicateBase.
This enables a missing bit of PredicateInfo handling in SCCP: Now
the predicate on the condition itself is also used. For switches
it means we know that the switched-on value is the same as the case
value. For assumes/branches we know that the condition is true or
false.
Differential Revision: https://reviews.llvm.org/D83640
Each concrete instance of a predicate has a condition (also noted in the
original PredicateBase comment) and to me it seems like there is no
clear benefit of having both PredicateBase and PredicateWithCondition
and they can be folded together.
Reviewers: nikic, efriedma
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84089
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.
This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.
This big hammer actually does catch said potential regression.
Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.
https://rise4fun.com/Alive/VS6
Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 C0, C1
Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0
Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0
Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 -C0, -C1
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
There is no need to add functions with void return types to the set of
tracked return values. This does not change functionality, because we
such functions do not have return values and we never update or access
them.
This reverts commit 1067d3e176,
which reverted commit b2018198c3,
because it introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils.
So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding
the cycle.
This reverts commit b2018198c3.
This commit introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils,
so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still
depends on it, we have a cycle.
Taking so many parameters is simply unmaintainable.
We don't want to include the entire llvm/Transforms/Utils/Local.h into
llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into
it's own header.
This fixes an instance where MemorySSA-using Dead Store Elimination is failing
to do a transformation that the non-MemorySSA-using version does.
Differential Revision: https://reviews.llvm.org/D83783
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
The current implementation of Tail Recursion Elimination has a very restricted
pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function
call receives a pointer to local stack. Generally, function calls that
receive a pointer to local stack but do not capture it - should not
break TRE. This fix allows us to do TRE if it is proved that no pointer
to the local stack is escaped.
Reviewed by: efriedma
Differential Revision: https://reviews.llvm.org/D82085
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.
Author: sidbav (Sidharth Baveja)
Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)
Reviewed By: Meinersbur (Michael Kruse)
Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80580
Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.
Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri, nikic
Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60413
The block front may be a PHI node, inserting a cast instructions like
BitCast, PtrToInt, IntToPtr among PHIs is not right.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D80975
Summary:
The test case started to hoist bitcasts to upper BB after D81730.
Reverted unintentional logic change. Some instructions may have zero cost but
will not be hoisted by different limitation so should be counted for threshold.
Reviewers: aprantl, arsenm, nhaehnle
Reviewed By: aprantl
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82761
Currently SCCP does not combine the information of conditions joined by
AND in the true branch or OR in the false branch.
For branches on AND, 2 copies will be inserted for the true branch, with
one being the operand of the other as in the code below. We can combine
the information using intersection. Note that for the OR case, the
copies are inserted in the false branch, where using intersection is
safe as well.
define void @foo(i32 %a) {
entry:
%lt = icmp ult i32 %a, 100
%gt = icmp ugt i32 %a, 20
%and = and i1 %lt, %gt
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] }
%a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a)
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] }
%a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0)
br i1 %and, label %true, label %false
true: ; preds = %entry
call void @use(i32 %a.1)
%true.1 = icmp ne i32 %a.1, 20
call void @use.i1(i1 %true.1)
ret void
false: ; preds = %entry
call void @use(i32 %a.1)
ret void
}
Reviewers: efriedma, davide, mssimpso, nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D77808
Summary:
Almost all uses of these iterators, including implicit ones, really
only need the const variant (as it should be). The only exception is
in NewGVN, which changes the order of dominator tree child nodes.
Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4
Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik
Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits
Tags: #clang, #mlir, #llvm
Differential Revision: https://reviews.llvm.org/D83087
This patch adds support for eliminating stores by free & lifetime.end
calls. We can remove stores that are not read before calling a memory
terminator and we can eliminate all stores after a memory terminator
until we see a new lifetime.start. The second case seems to not really
trigger much in practice though.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72410
When all else fails, use range metadata to constrain the result
of loads and calls. It should also be possible to use !nonnull,
but that would require some general support for inequalities in
SCCP first.
Differential Revision: https://reviews.llvm.org/D83179
Take assume predicates into account when visiting ssa.copy. The
handling is the same as for branch predicates, with the difference
that we're always on the true edge.
Differential Revision: https://reviews.llvm.org/D83257
Summary: This patch makes code motion checks optional which are dependent on
specific analysis example, dominator tree, post dominator tree and dependence
info. The aim is to make the adoption of CodeMoverUtils easier for clients that
don't use analysis which were strictly required by CodeMoverUtils. This will
also help in diversifying code motion checks using other analysis example MSSA.
Authored By: RithikSharma
Reviewer: Whitney, bmahjour, etiotto
Reviewed By: Whitney
Subscribers: Prazek, hiraditya, george.burgess.iv, asbirlea, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D82566
The (previously-crashing) test-case would cause us to seemingly-harmlessly
replace some use with something else, but we can't replace it with itself,
so we would crash.
As reported in https://reviews.llvm.org/D83101#2133062
the new visitInsertElementInst()/visitExtractElementInst() functionality
is causing miscompiles (previously-crashing test added)
It is due to the fact how the infra of Scalarizer is dealing with DCE,
it was not updated or was it ready for such scalar value forwarding.
It always assumed that the moment we "scalarized" something,
it can go away, and did so with prejudice.
But that is no longer safe/okay to do.
Instead, let's prevent it from ever shooting itself into foot,
and let's just accumulate the instructions-to-be-deleted
in a vector, and collectively cleanup (those that are *actually* dead)
them all at the end.
All existing tests are not reporting any new garbage leftovers,
but maybe it's test coverage issue.
Summary:
Avoid exposing details about how roots are stored. This enables subsequent
type-erasure changes.
v5:
- cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE
Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83085
Summary:
Avoid exposing details about how children are stored. This will enable
subsequent type-erasure changes.
New methods are introduced to cover common access patterns.
Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83083
Compilers may evaluate call arguments in different order,
which would result in different order of IR, which would break the tests.
Spotted thanks to Dmitri Gribenko!
Summary:
I'm interested in taking the original C++ input,
for which we currently are stuck with an alloca
and producing roughly the lower IR,
with neither an alloca nor a vector ops:
https://godbolt.org/z/cRRWaJ
For that, as intermediate step, i'd to somehow perform scalarization.
As per @arsenmn suggestion, i'm trying to see if scalarizer can help me
avoid writing a bicycle.
I'm not sure if it's really intentional that variable insert is not handled currently.
If it really is, and is supposed to stay that way (?), i guess i could guard it..
See [[ https://bugs.llvm.org/show_bug.cgi?id=46524 | PR46524 ]].
Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert
Reviewed By: jdoerfert
Subscribers: arphaman, uabelho, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82961
Summary:
It appears to be better IR-wise to aggressively scalarize it,
rather than relying on gathering it, and leaving it as-is.
Reviewers: jdoerfert, bjope, arsenm, cameron.mcinally
Reviewed By: jdoerfert
Subscribers: arphaman, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83101
Summary: As it can be clearly seen from the diff, this results in nicer IR.
Reviewers: jdoerfert, arsenm, bjope, cameron.mcinally
Reviewed By: jdoerfert
Subscribers: arphaman, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83102
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
This emits a remark when LoopDeletion deletes a dead loop, using the
source location of the loop's header. There are currently two reasons
for removing the loop: invariant loop or loop that never executes.
Differential Revision: https://reviews.llvm.org/D83113
LowerConstantIntrinsics fails to preserve the analysis result of
GlobalsAA. Not preserving the analysis might affect benchmark
performance. This change fixes this issue.
Patch by Ryan Santhiraraja <rsanthir@quicinc.com>
Reviewers: fpetrogalli, joerg, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D82342
Summary:
- `ptrtoint` and `inttoptr` are defined as no-op casts if the integer
value as the same size as the pointer value. The pair of
`ptrtoint`/`inttoptr` is in fact a no-op cast sequence between
different address spaces. Teach `infer-address-spaces` to handle them
like a `bitcast`.
Reviewers: arsenm, chandlerc
Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81938
For passes got skipped, this is confusing because the log said it is `running pass`
but it is skipped later.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82511
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Provided test case crashes otherwise.
If NewTy is already DL.getIntPtrType(NewTy),
CreateBitCast() won't actually create any bitcast,
so we are better off just doing the general thing.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.
Reviewers: samparker
Reviewed By: samparker
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82383
This patch add support for eliminating MemoryDefs that do not have any
aliasing users, which indicates that there are no reads/writes to the
memory location until the end of the function.
To eliminate such defs, we have to ensure that the underlying object is
not visible in the caller and does not escape via returning. We need a
separate check for that, as InvisibleToCaller does not consider returns.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker, george.burgess.iv
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72631
This patch extends storeIsNoop to also detect stores of 0 to an calloced
object. This basically ports the logic from legacy DSE to the MemorySSA
backed version.
It triggers in a few cases on MultiSource, SPEC2000, SPEC2006 with -O3
LTO:
Same hash: 218 (filtered out)
Remaining: 19
Metric: dse.NumNoopStores
Program base patch2 diff
test-suite...CFP2000/177.mesa/177.mesa.test 1.00 15.00 1400.0%
test-suite...6/482.sphinx3/482.sphinx3.test 1.00 14.00 1300.0%
test-suite...lications/ClamAV/clamscan.test 2.00 28.00 1300.0%
test-suite...CFP2006/433.milc/433.milc.test 1.00 8.00 700.0%
test-suite...pplications/oggenc/oggenc.test 2.00 9.00 350.0%
test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test NaN 137.00 nan%
test-suite...libquantum/462.libquantum.test NaN 3.00 nan%
test-suite...6/464.h264ref/464.h264ref.test NaN 7.00 nan%
test-suite...decode/alacconvert-decode.test NaN 2.00 nan%
test-suite...encode/alacconvert-encode.test NaN 2.00 nan%
test-suite...ications/JM/ldecod/ldecod.test NaN 9.00 nan%
test-suite...ications/JM/lencod/lencod.test NaN 39.00 nan%
test-suite.../Applications/lemon/lemon.test NaN 2.00 nan%
test-suite...pplications/treecc/treecc.test NaN 4.00 nan%
test-suite...hmarks/McCat/08-main/main.test NaN 4.00 nan%
test-suite...nsumer-lame/consumer-lame.test NaN 3.00 nan%
test-suite.../Prolangs-C/bison/mybison.test NaN 1.00 nan%
test-suite...arks/mafft/pairlocalalign.test NaN 30.00 nan%
Reviewers: efriedma, zoecarver, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82204
This updates the MemorySSA backed implementation to treat arguments
passed by value similar to allocas: in they are assumed to be invisible
in the caller. This is similar to how they are treated in legacy DSE.
Reviewers: efriedma, asbirlea, george.burgess.iv
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D82222
Summary:
- When promoting a pointer from memory to register, SROA skips pointers
from different address spaces. However, as `ptrtoint` and `inttoptr`
are defined as no-op casts if that integer type has the same as the
pointer value, generate the pair of `ptrtoint`/`inttoptr` (no-op cast)
sequence to convert pointers from different address spaces if they
have the same size.
Reviewers: arsenm, chandlerc, lebedev.ri
Subscribers:
Differential Revision: https://reviews.llvm.org/D81943
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
Currently we stop exploring candidates too early in some cases.
In particular, we can continue checking the defining accesses of
non-removable MemoryDefs and defs without analyzable write location
(read clobbers are already ruled out using MemorySSA at this point).
As we traverse the CFG backwards, we could end up reaching unreachable
blocks. For unreachable blocks, we won't have computed post order
numbers and because DomAccess is reachable, unreachable blocks cannot be
on any path from it.
This fixes a crash with unreachable blocks.
This patch updates SCCP/IPSCCP to use the computed range info to turn
sexts into zexts, if the value is known to be non-negative. We already
to a similar transform in CorrelatedValuePropagation, but it seems like
we can catch a lot of additional cases by doing it in SCCP/IPSCCP as
well.
The transform is limited to ranges that are known to not include undef.
Currently constant ranges from conditions are treated as potentially
containing undef, due to PR46144. Once we flip this, the transform will
be more effective in practice.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81756
Move code that may update the IR after precondition, so that if precondition
fail, the IR isn't modified.
Differential Revision: https://reviews.llvm.org/D81225
This patch updates LowerMatrixIntrinsics to preserve the alignment
specified at the original load/stores and the align attribute for the
pointer argument of the column.major.load/store intrinsics.
We can always use the specified alignment for the load of the first
column. For subsequent columns, the alignment may need to be reduced.
For ConstantInt strides, compute the offset for the start of the column in
bytes and use commonAlignment to get the largest valid alignment.
For non-ConstantInt strides, we need to take the common alignment of the
initial alignment and the element size in bytes.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D81960
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D81498
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
In more complicated loops we can easily hit the complexity limits of
loop strength reduction. If we do and filtering occurs, it's all too
easy to remove the wrong formulae for post-inc preferring accesses due
to it attempting to maximise register re-use. The patch adds an
alternative filtering step when the target is preferring postinc to pick
postinc formulae instead, hopefully lowering the complexity to below the
limit so that aggressive filtering is not needed.
There is also a change in here to stop considering existing addrecs as
free under postinc. We should already be modelling them as a reg so
don't want it to cause us to get the cost wrong. (I'm not sure that code
makes sense in general, but there are X86 tests specifically for it
where it seems to be helping so have left it around for the standard
non-post-inc case).
Differential Revision: https://reviews.llvm.org/D80273
Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
This patch adds a new option to CriticalEdgeSplittingOptions to control
whether loop-simplify form must be preserved. It is them used by GVN to
indicate that loop-simplify form does not have to be preserved.
This fixes a crash exposed by 189efe295b.
If the critical edge we are splitting goes from a block inside a loop to
a block outside the loop, splitting the edge will create a new exit
block. As a result, the new block will branch to the original exit
block, which will add a non-loop predecessor, breaking loop-simplify
form. To preserve loop-simplify form, the predecessor blocks of the
original exit are split, but that does not work for blocks with
indirectbr terminators. If preserving loop-simplify form is requested,
bail out , before making any changes.
Reviewers: reames, hfinkel, davide, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81582
Previously these functions either returned a "changed" flag or a "repeat
instruction" flag, and could also modify an iterator to control which
instruction would be processed next.
Simplify this by always returning a "changed" flag, and handling all of
the "repeat instruction" functionality by modifying the iterator.
No functional change intended except in this case:
// If the source and destination of the memcpy are the same, then zap it.
... where the previous code failed to process the instruction after the
zapped memcpy.
Differential Revision: https://reviews.llvm.org/D81540
This patch relaxes the post-dominance requirement for accesses to
objects visible after the function returns.
Instead of requiring the killing def to post-dominate the access to
eliminate, the set of 'killing blocks' (= blocks that completely
overwrite the original access) is collected.
If all paths from the access to eliminate and an exit block go through a
killing block, the access can be removed.
To check this property, we first get the common post-dominator block for
the killing blocks. If this block does not post-dominate the access
block, there may be a path from DomAccess to an exit block not involving
any killing block.
Otherwise we have to check if there is a path from the DomAccess to the
common post-dominator, that does not contain a killing block. If there
is no such path, we can remove DomAccess. For this check, we start at
the common post-dominator and then traverse the CFG backwards. Paths are
terminated when we hit a killing block or a block that is not executed
between DomAccess and a killing block according to the post-order
numbering (if the post order number of a block is greater than the one
of DomAccess, the block cannot be in in a path starting at DomAccess).
This gives the following improvements on the total number of stores
after DSE for MultiSource, SPEC2K, SPEC2006:
Tests: 237
Same hash: 206 (filtered out)
Remaining: 31
Metric: dse.NumRemainingStores
Program base new100 diff
test-suite...CFP2000/188.ammp/188.ammp.test 3624.00 3544.00 -2.2%
test-suite...ch/g721/g721encode/encode.test 128.00 126.00 -1.6%
test-suite.../Benchmarks/Olden/mst/mst.test 73.00 72.00 -1.4%
test-suite...CFP2006/433.milc/433.milc.test 3202.00 3163.00 -1.2%
test-suite...000/186.crafty/186.crafty.test 5062.00 5010.00 -1.0%
test-suite...-typeset/consumer-typeset.test 40460.00 40248.00 -0.5%
test-suite...Source/Benchmarks/sim/sim.test 642.00 639.00 -0.5%
test-suite...nchmarks/McCat/09-vor/vor.test 642.00 644.00 0.3%
test-suite...lications/sqlite3/sqlite3.test 35664.00 35563.00 -0.3%
test-suite...T2000/300.twolf/300.twolf.test 7202.00 7184.00 -0.2%
test-suite...lications/ClamAV/clamscan.test 19475.00 19444.00 -0.2%
test-suite...INT2000/164.gzip/164.gzip.test 2199.00 2196.00 -0.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 2380.00 2378.00 -0.1%
test-suite.../Benchmarks/Bullet/bullet.test 39335.00 39309.00 -0.1%
test-suite...:: External/Povray/povray.test 36951.00 36927.00 -0.1%
test-suite...marks/7zip/7zip-benchmark.test 67396.00 67356.00 -0.1%
test-suite...6/464.h264ref/464.h264ref.test 31497.00 31481.00 -0.1%
test-suite...006/453.povray/453.povray.test 51441.00 51416.00 -0.0%
test-suite...T2006/401.bzip2/401.bzip2.test 4450.00 4448.00 -0.0%
test-suite...Applications/kimwitu++/kc.test 23481.00 23471.00 -0.0%
test-suite...chmarks/MallocBench/gs/gs.test 6286.00 6284.00 -0.0%
test-suite.../CINT2000/254.gap/254.gap.test 13719.00 13715.00 -0.0%
test-suite.../Applications/SPASS/SPASS.test 30345.00 30338.00 -0.0%
test-suite...006/450.soplex/450.soplex.test 15018.00 15016.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 27780.00 27777.00 -0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 105285.00 105276.00 -0.0%
There might be potential to pre-compute some of the information of which
blocks are on the path to an exit for each block, but the overall
benefit might be comparatively small.
On the set of benchmarks, 15738 times out of 20322 we reach the
CFG check, the CFG check is successful. The total number of iterations
in the CFG check is 187810, so on average we need less than 10 steps in
the check loop. Bumping the threshold in the loop from 50 to 150 gives a
few small improvements, but I don't think they warrant such a big bump
at the moment. This is all pending further tuning in the future.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: george.burgess.iv
Differential Revision: https://reviews.llvm.org/D78932
blocks.
Summary: The current LoopFusion forget to update the incoming block of
the phis in second loop guard non loop successor from second loop guard
block to first loop guard block. A test case is provided to better
understand the problem.
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81421
Slice::operator<() has a non-deterministic behavior. If we have
identical slices comparison will depend on the order or operands.
Normally that does not result in unstable compilation results
because the order in which slices are inserted into the vector
is deterministic and llvm::sort() normally behaves as a stable
sort, although that is not guaranteed.
However, there is test option -sroa-random-shuffle-slices which
is used to check exactly this aspect. The vector is first randomly
shuffled and then sorted. The same shuffling happens without this
option under expensive llvm checks.
I have managed to write a test which has hit this problem.
There are no fields in the Slice class to resolve the instability.
We only have offsets, IsSplittable and Use, but neither Use nor
User have anything suitable for predictable comparison.
I have switched to stable_sort which has to be sufficient and
removed that randon shuffle option.
Differential Revision: https://reviews.llvm.org/D81310
- Now all SalvageDebugInfo() calls will mark undef if the salvage
attempt fails.
Reviewed by: vsk, Orlando
Differential Revision: https://reviews.llvm.org/D78369
If an instruction is erased we also need to remove it from
Visited set. There is a very small chance that an another
newly created instruction will be created with the same
pointer value in place of an erased one.
Differential Revision: https://reviews.llvm.org/D80958
Now that we have an operand based form for the GC arguments to a statepoint intrinsic, update RS4GC to use it and update tests to reflect. This is pretty straight forward. I nearly landed without review, but figured a second set of eyes didn't hurt.
Differential Revision: https://reviews.llvm.org/D81121
Remove the requirement, that when performing accumulator elimination,
all other cases must return the same dynamic constant. We can do this by
initializing the accumulator with the identity value of the accumulation
operation, and inserting an additional operation before any return.
Differential Revision: https://reviews.llvm.org/D80844
Summary:
This patch simplifies FindMostPopularDest without changing the
functionality.
Given a list of jump threading destinations, the function finds the
most popular destination. To ensure determinism when there are
multiple destinations with the highest popularity, the function picks
the first one in the successor list with the highest popularity.
Without this patch:
- The function populates DestPopularity -- a histogram mapping
destinations to their respective occurrence counts.
- Then we iterate over DestPopularity, looking for the highest
popularity while building a vector of destinations with the highest
popularity.
- Finally, we iterate the successor list, looking for the destination
with the highest popularity.
With this patch:
- We implement DestPopularity with MapVector instead of DenseMap. We
populate the map with popularity 0 for all successors in the order
they appear in the successor list.
- We build the histogram in the same way as before.
- We simply use std::max_element on DestPopularity to find the most
popular destination. The use of MapVector ensures determinism.
Reviewers: wmi, efriedma
Reviewed By: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81030
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037