This commit fixes how metadata is handled in CloneModule to be sound,
and improves how it's handled in CloneFunctionInto (although the latter
is still awkward when called within a module).
Ruiling Song pointed out in PR48841 that CloneModule was changed to
unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in
fa35c1f80f for clarity). This flag papered
over a crash caused by other various changes made to CloneFunctionInto
over the past few years that made it unsound to use cloning between
different modules.
(This commit partially addresses PR48841, fixing the repro from
preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode
became unsound in df763188c9 and this
commit does not address that regression.)
RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use,
avoiding unnecessary clones of all referenced metadata when linking
between modules (with IRMover, the source module is discarded after
linking). It never makes sense to use when you're not discarding the
source. This commit drops its incorrect use in CloneModule.
Sadly, the right thing to do with metadata when cloning a function is
complicated, and this patch doesn't totally fix it.
The first problem is that there are two different types of referenceable
metadata and it's not obvious what to with one of them when remapping.
- `!0 = !{!1}` is metadata's version of a constant. Programatically it's
called "uniqued" (probably a better term would be "constant") because,
like `ConstantArray`, it's stored in uniquing tables. Once it's
constructed, it's illegal to change its arguments.
- `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal
to change the operands after construction.
What should be done with distinct metadata when cloning functions within
the same module?
- Should new, cloned nodes be created?
- Should all references point to the same, old nodes?
The answer depends on whether that metadata is effectively owned by a
function.
And that's the second problem. Referenceable metadata's ownership model
is not clear or explicit. Technically, it's all stored on an
LLVMContext. However, any metadata that is `distinct`, that transitively
references a `distinct` node, or that transitively references a
GlobalValue is specific to a Module and is effectively owned by it. More
specifically, some metadata is effectively owned by a specific Function
within a module.
Effectively function-local metadata was introduced somewhere around
c10d0e5ccd, which made it illegal for two
functions to share a DISubprogram attachment.
When cloning a function within a module, you need to clone the
function-local debug info and suppress cloning of global debug info (the
status quo suppresses cloning some global debug info but not all). When
cloning a function to a new/different module, you need to clone all of
the debug info.
Here's what I think we should do (eventually? soon? not this patch
though):
- Distinguish explicitly (somehow) between pure constant metadata owned
by the LLVMContext, global metadata owned by the Module, and local
metadata owned by a GlobalValue (such as a function).
- Update CloneFunctionInto to trigger cloning of all "local" metadata
(only), perhaps by adding a bit to RemapFlag. Alternatively, split
out a separate function CloneFunctionMetadataInto to prime the
metadata map that callers are updated to call ahead of time as
appropriate.
Here's the somewhat more isolated fix in this patch:
- Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to
an enum called `CloneFunctionChangeType` that is one of
LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule.
- The code maintaining the "functions uniquely own subprograms"
invariant is now only active in the first two cases, where a function
is being cloned within a single module. That's necessary because this
code inhibits cloning of (some) "global" metadata that's effectively
owned by the module.
- The code maintaining the "all compile units must be explicitly
referenced by !llvm.dbg.cu" invariant is now only active in the
DifferentModule case, where a function is being cloned into a new
module in isolation.
- CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create
uses LocalChangeOnly, since fa635d730f
only set `ModuleLevelChanges` to trigger cloning of local metadata.
- CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs
and special handling of !llvm.dbg.cu.
- Fixed some outdated header docs and left a couple of FIXMEs.
Differential Revision: https://reviews.llvm.org/D96531
Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.
Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D92489
Currently SCEVExpander creates inttoptr for non-integral pointers if the
base is a null constant for example. This results in invalid IR.
This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert
to a non-integral pointer. First, a GEP of i8* null is generated and the
integral value is used as index. The GEP is then bitcasted to the target
type.
This was exposed by D71539.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87827
When trying to enable -debug-info-kind=constructor there was an assert
that occurs during debug info cloning ("mismatched subprogram between
llvm.dbg.value variable and !dbg attachment").
It appears that during llvm::CloneFunctionInto, a DISubprogram could be
duplicated when MapMetadata is called, and then added to the MD map again
when DIFinder gets a list of subprograms. This results in two different
versions of the DISubprogram.
This patch switches the order so that the DIFinder subprograms are
added before MapMetadata is called.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46784
Differential Revision: https://reviews.llvm.org/D86185
This reverts the revert commit dc28675768.
It includes a fix for Polly, which uses SCEVExpander on IR that is not
in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do
not introduce LCSSA phis where there were none before.
This reverts commit 99166fd4fb, because it
breaks the polly builders.
polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails
because a apparently unnecessary LCSSA phi node is introduced.
Make the bots green again, while I take a closer look.
This patch teaches SCEVExpander to directly preserve LCSSA.
As it is currently, SCEV does not look through PHI nodes in loops,
as it might break LCSSA form. Once SCEVExpander can preserve
LCSSA form, it should be safe for SCEV to look through PHIs.
To preserve LCSSA form, this patch uses formLCSSAForInstructions
on operands of newly created instructions, if the definition is inside
a different loop than the new instruction.
The final value we return from expandCodeFor may also need LCSSA
phis, depending on the insert point. As no user for it exists there yet,
create a temporary instruction at the insert point, which can be passed
to formLCSSAForInstructions. This temporary instruction is removed
after LCSSA construction.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D71538
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
Summary: This patch adds more test case focusing on data dependency.
Authored By: RithikSharma
Reviewer: Whitney, bmahjour, etiotto
Reviewed By: Whitney
Subscribers: llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D83543
Summary: This patch makes code motion checks optional which are dependent on
specific analysis example, dominator tree, post dominator tree and dependence
info. The aim is to make the adoption of CodeMoverUtils easier for clients that
don't use analysis which were strictly required by CodeMoverUtils. This will
also help in diversifying code motion checks using other analysis example MSSA.
Authored By: RithikSharma
Reviewer: Whitney, bmahjour, etiotto
Reviewed By: Whitney
Subscribers: Prazek, hiraditya, george.burgess.iv, asbirlea, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D82566
Move ScalarEvolution::forgetLoopDispositions implementation to ScalarEvolution.cpp to remove the dependency.
Add implicit header dependency to source files where necessary.
code motion
Summary: Currently isSafeToMoveBefore uses DFS numbering for determining
the relative position of instruction and insert point which is not
always correct. This PR proposes the use of Dominator Tree depth for the
same. If a node is at a higher level than the insert point then it is
safe to say that we want to move in the forward direction.
Authored By: RithikSharma
Reviewer: Whitney, nikic, bmahjour, etiotto, fhahn
Reviewed By: Whitney
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80084
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.
Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Splitting critical edges for indirect branches
the SplitIndirectBrCriticalEdges() function may break branch
probabilities if target basic block happens to have unset
a probability for any of its successors. That is because in
such cases the getEdgeProbability(Target) function returns
probability 1/NumOfSuccessors and it is called after Target
was split (thus Target has a single successor). As the result
the correspondent successor of the split block gets
probability 100% but 1/NumOfSuccessors is expected (or better
be left unset).
Reviewers: yamauchi
Differential Revision: https://reviews.llvm.org/D78806
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.
Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.
Reviewers: davidxl, dblaikie, wmi
Reviewed By: dblaikie
Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77930
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
Summary:
Assume bundles need to be usable by Analysis and Transforms/Utils isn't.
so this commit moves utilities to deal with asusme bundles to IR.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75618
Summary: Finding what information is know about a value from a use is generally useful and can be done quickly.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75616
Summary: This patch adds a new way to query operand bundles of an llvm.assume that is much better suited to some users like the Attributor that need to do many queries on the operand bundles of llvm.assume. Some modifications of the IR like replaceAllUsesWith can cause information in the map to be outdated, so this API is more suited to analysis passes and passes that don't make modification that could invalidate the map.
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75020
Summary: This fixes the crash that led to the revert of D69591.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75307
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
During extraction, stale llvm.assume handles may be retained in the
original function. The setup is:
1) CodeExtractor unregisters assumptions in the blocks that are to be
extracted.
2) Extraction happens. There are now two functions: f1 and f1.extracted.
3) Leftover assumptions in f1 (/not/ removed as they were not in the set of
blocks to be extracted) now have affected-value llvm.assume handles in
f1.extracted.
When assumptions for a value used in f1 are looked up, ValueTracking can assert
as some of the handles are in the wrong function. To fix this, simply erase the
llvm.assume calls in the extracted function.
Alternatives include flushing the assumption cache in the original function, or
walking all values used in the original function to prune stale affected-value
handles. Both seem more expensive.
Testing: check-llvm, LNT run with -mllvm -hot-cold-split enabled
rdar://58460728
Summary:
Currently IsControlFlowEquivalent determine if two blocks are control
flow equivalent by checking if A dominates B and B post dominates A.
There exists blocks that are control flow equivalent even if they don't
satisfy the A dominates B and B post dominates A condition.
For example,
if (cond)
A
if (cond)
B
In the PR, we determine if two blocks are control flow equivalent by
also checking if the two sets of conditions A and B depends on are
equivalent.
Reviewer: jdoerfert, Meinersbur, dmgreen, etiotto, bmahjour, fhahn,
hfinkel, kbarton
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71578
In case of loops with multiple exit where all-but-one exit are deoptimizing
it might happen that the first rotation will end up with latch having a deoptimizing
exit. This makes the loop unsuitable for trip-count analysis (say, getLoopEstimatedTripCount)
as well as for loop transformations that know how to handle multple deoptimizing exits.
It pretty much means that canonical form in multple-deoptimizing-exits case should be
with non-deoptimizing exit at latch.
Teach loop-rotation to reach this canonical form by repeating rotation.
-loop-rotate-multi option introduced to control this behavior, currently disabled by default.
Reviewers: skatkov, asbirlea, reames, fhahn
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73058
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Commit d77ae1552f
("[DebugInfo] Support to emit debugInfo for extern variables")
added deebugInfo for extern variables for BPF target.
The commit is reverted by 891e25b02d
as the committed tests using %clang instead of %clang_cc1 causing
test failed in certain scenarios as reported by Reid Kleckner.
This patch fixed the tests by using %clang_cc1.
Differential Revision: https://reviews.llvm.org/D71818
Extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
- extern types are required to have a suitable interface
in libbpf (bpf loader) to provide kernel config parameters
to bpf programs.
https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
- extern types are required so kernel bpf verifier can
verify program which uses external functions more precisely.
This will make later link with actual external function no
need to reverify.
https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed
This patch added clang support to emit debuginfo for extern variables
with a TargetInfo hook to enable it. The debuginfo for the
extern variable is emitted only if that extern variable is
referenced in the current compilation unit.
Currently, only BPF target enables to generate debug info for
extern variables. The emission of such debuginfo is disabled for C++
at this moment since BPF only supports a subset of C language.
Emission with C++ can be enabled later if an appropriate use case
is identified.
-fstandalone-debug permits us to see more debuginfo with the cost
of bloated binary size. This patch did not add emission of extern
variable debug info with -fstandalone-debug. This can be
re-evaluated if there is a real need.
Differential Revision: https://reviews.llvm.org/D70696
AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.
Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>
Reviewers: fhahn, lebedev.ri, spatel
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D69963
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70988
Summary:
Related bug: https://bugs.llvm.org/show_bug.cgi?id=40648
Static helper function rewriteDebugUsers in Local.cpp deletes dbg.value
intrinsics when it cannot move or rewrite them, or salvage the deleted
instruction's value. It should instead undef them in this case.
This patch fixes that and I've added a test which covers the failing test
case in bz40648. I've updated the unit test Local.ReplaceAllDbgUsesWith
to check for this behaviour (and fixed a typo in the test which would
cause the old test to always pass).
Reviewers: aprantl, vsk, djtodoro, probinson
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70604
moved before another instruction.
Summary:Added an API to check if an instruction can be safely moved
before another instruction. In future PRs, we will like to add support
of moving instructions between blocks that are not control flow
equivalent, and add other APIs to enhance usability, e.g. moving basic
blocks, moving list of instructions...
Loop Fusion will be its first user. When there is intervening code in
between two loops, fusion is currently unable to fuse them. Loop Fusion
can use this utility to check if the intervening code can be safely
moved before or after the two loops, and move them, then it can
successfully fuse them.
Reviewer:kbarton,jdoerfert,Meinersbur,bmahjour,etiotto
Reviewed By:bmahjour
Subscribers:mgorny,hiraditya,llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D70049
The attribute is stored at the `FunctionIndex` attribute set, with the
name "vector-function-abi-variant".
The get/set methods of the attribute have assertion to verify that:
1. Each name in the attribute is a valid VFABI mangled name.
2. Each name in the attribute correspond to a function declared in the
module.
Differential Revision: https://reviews.llvm.org/D69976
Summary:
(Split of off D67120)
SizeOpts/MachineSizeOpts changes for profile guided size optimization.
(A second try after previously committed as r375254 and reverted as r375375.)
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69409
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.
This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.
Pre-patch (Release + LTO clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting
```
Post-patch (ReleaseAsserts clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting
```
Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.
An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).
rdar://55912966
Differential Revision: https://reviews.llvm.org/D68616
llvm-svn: 374089
Terminators like invoke can have users outside the current basic block.
We have to replace those users with undef, before replacing the
terminator.
This fixes a crash exposed by rL373430.
Reviewers: brzycki, asbirlea, davide, spatel
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D68327
llvm-svn: 373513
There are no users that pass in LazyValueInfo, so we can simplify the
function a bit.
Reviewers: brzycki, asbirlea, davide
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D68297
llvm-svn: 373488
...cloning a function from a different module
Currently when a function with debug info is cloned from a different module, the
cloned function may have hanging DICompileUnits, so that the module with the
cloned function fails debug info verification.
The proposed fix inserts all DICompileUnits reachable from the cloned function
to "llvm.dbg.cu" metadata operands of the cloned function module.
Reviewed By: aprantl, efriedma
Differential Revision: https://reviews.llvm.org/D66510
Patch by Oleg Pliss (Oleg.Pliss@azul.com)
llvm-svn: 370265
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
loop
Summary:
Do the cloning in two steps, first allocate all the new loops, then
clone the basic blocks in the same order as the original loop.
Reviewer: Meinersbur, fhahn, kbarton, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, hiraditya, llvm-commits
Tag: https://reviews.llvm.org/D64224
Differential Revision:
llvm-svn: 365366
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Summary:
Create a method to forget everything in SCEV.
Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll.
Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch.
Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build.
Testcase showcasing this cannot be opensourced and is fairly large.
The option disabled by default, but it may be desirable to enable by
default. Evidence in favor (two difference runs on different days/ToT state):
File Before (s) After (s)
clang-9.bc 7267.91 6639.14
llvm-as.bc 194.12 194.12
llvm-dis.bc 62.50 62.50
opt.bc 1855.85 1857.53
File Before (s) After (s)
clang-9.bc 8588.70 7812.83
llvm-as.bc 196.20 194.78
llvm-dis.bc 61.55 61.97
opt.bc 1739.78 1886.26
Reviewers: sanjoy
Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60144
llvm-svn: 358304
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
This is a recommit of r356442 with trivial fixes for the failing tests.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356451
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356442
Summary:
Extract the functionality of eliminating unreachable basic blocks
within a function, previously encapsulated within the
-unreachableblockelim pass, and make it available as a function within
BlockUtils.h. No functional change intended other than making the logic
reusable.
Exposing this logic makes it easier to implement
https://reviews.llvm.org/D59068, which fixes coroutines bug
https://bugs.llvm.org/show_bug.cgi?id=40979.
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59069
llvm-svn: 355846
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.
Patch by Sergei Kachkov!
Fixes llvm.org/PR40455.
Differential Revision: https://reviews.llvm.org/D57919
llvm-svn: 353562
DomTreeUpdater depends on headers from Analysis, but is in IR. This is a
layering violation since Analysis depends on IR. Relocate this code from IR
to Analysis to fix the layering violation.
llvm-svn: 353265
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.
This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:
The unwind destination does not have an exception handling instruction!
%call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
to label %invoke.cont34 unwind label %lpad33, !dbg !4183
The unwind destination does not have an exception handling instruction!
invoke void @objc_exception_throw(i8* %call35) #12
to label %invoke.cont36 unwind label %lpad33, !dbg !4184
LandingPadInst not the first non-PHI instruction in the block.
%3 = landingpad { i8*, i32 }
catch i8* null, !dbg !1411
rdar://46540815
llvm-svn: 348562
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.
Patch by Sergei Kachkov!
Differential Revision: https://reviews.llvm.org/D55018
llvm-svn: 348205
This will hold flags specific to subprograms. In the future
we could potentially free up scarce bits in DIFlags by moving
subprogram-specific flags from there to the new flags word.
This patch does not change IR/bitcode formats, that will be
done in a follow-up.
Differential Revision: https://reviews.llvm.org/D54597
llvm-svn: 347239
This patch updates DuplicateInstructionsInSplitBetween to update a DTU
instead of applying updates to the DT directly.
Given that there only are 2 users, also updated them in this patch to
avoid churn.
I slightly moved the code in CallSiteSplitting around to reduce the
places where we have to pass in DTU. If necessary, I could split those
changes in a separate patch.
This fixes missing DT updates when dealing with musttail calls in
CallSiteSplitting, by using DTU->deleteBB.
Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki
Reviewed By: NutshellySima
llvm-svn: 346769
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.
The compiler would crash if this function is called with a malformed loop.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D51486
llvm-svn: 342958
These classes don't make any changes to IR and have no reason to be in
Transform/Utils. This patch moves them to Analysis folder. This will allow
us reusing these classes in some analyzes, like MustExecute.
llvm-svn: 341015
This is a bit awkward in a handful of places where we didn't even have
an instruction and now we have to see if we can build one. But on the
whole, this seems like a win and at worst a reasonable cost for removing
`TerminatorInst`.
All of this is part of the removal of `TerminatorInst` from the
`Instruction` type hierarchy.
llvm-svn: 340701
In the past, DbgInfoIntrinsic has a strong assumption that these
intrinsics all have variables and expressions attached to them.
However, it is too strong to derive the class for other debug entities.
Now, it has problems for debug labels.
In order to make DbgInfoIntrinsic as a base class for 'debug info', I
create a class for 'variable debug info', DbgVariableIntrinsic.
DbgDeclareInst, DbgAddrIntrinsic, and DbgValueInst will be derived from it.
Differential Revision: https://reviews.llvm.org/D50220
llvm-svn: 338984
Summary:
Previously, `removeUnreachableBlocks` still returns true (which indicates the CFG is changed) even when all the unreachable blocks found is awaiting deletion in the DDT class.
This makes code pattern like
```
// Code modified from lib/Transforms/Scalar/SimplifyCFGPass.cpp
bool EverChanged = removeUnreachableBlocks(F, nullptr, DDT);
...
do {
EverChanged = someMightHappenModifications();
EverChanged |= removeUnreachableBlocks(F, nullptr, DDT);
} while (EverChanged);
```
become a dead loop.
Fix this by detecting whether a BasicBlock is already awaiting deletion.
Reviewers: kuhar, brzycki, dmgreen, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49738
llvm-svn: 338882