Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
of other paths.
New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.
One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.
In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.
Reviewed By: yrouban
Differential Revision: https://reviews.llvm.org/D79485
This is a follow-up patch of D87045.
The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated.
`createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D87531
MSSA DSE starts at a killing store, finds an earlier store and
then checks that the earlier store is not read along any paths
(without being killed first). However, it uses the memory location
of the killing store for that, not the earlier store that we're
attempting to eliminate.
This has a number of problems:
* Mismatches between what BasicAA considers aliasing and what DSE
considers an overwrite (even though both are correct in isolation)
can result in miscompiles. This is PR48279, which D92045 tries to
fix in a different way. The problem is that we're using a location
from a store that is potentially not executed and thus may be UB,
in which case analysis results can be arbitrary.
* Metadata on the killing store may be used to determine aliasing,
but there is no guarantee that the metadata is valid, as the specific
killing store may not be executed. Using the metadata on the earlier
store is valid (it is the store we're removing, so on any execution
where its removal may be observed, it must be executed).
* The location is imprecise. For full overwrites the killing store
will always have a location that is larger or equal than the earlier
access location, so it's beneficial to use the earlier access
location. This is not the case for partial overwrites, in which
case either location might be smaller. There is some room for
improvement here.
Using the earlier access location means that we can no longer cache
which accesses are read for a given killing store, as we may be
querying different locations. However, it turns out that simply
dropping the cache has no notable impact on compile-time.
Differential Revision: https://reviews.llvm.org/D93523
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.
This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet:
```
while (!PN.use_empty()) {
LoadInst *LI = cast<LoadInst>(PN.user_back());
LI->replaceAllUsesWith(NewPN);
LI->eraseFromParent();
}
```
If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:
```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
LoadTy, InVal, Alignment,
(PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```
This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D92431
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).
There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:
// 1) Missing const
namespace missing_const {
struct A {
#ifndef FIXED
bool operator==(A const&);
#else
bool operator==(A const&) const;
#endif
};
bool a = A{} == A{}; // error
}
// 2) Type mismatch on CRTP
namespace crtp_mismatch {
template <typename Derived>
struct Base {
#ifndef FIXED
bool operator==(Derived const&) const;
#else
// in one case changed to taking Base const&
friend bool operator==(Derived const&, Derived const&);
#endif
};
struct D : Base<D> { };
bool b = D{} == D{}; // error
}
// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
template <bool Const>
struct iterator {
using const_iterator = iterator<true>;
iterator();
template <bool B, std::enable_if_t<(Const && !B), int> = 0>
iterator(iterator<B> const&);
#ifndef FIXED
bool operator==(const_iterator const&) const;
#else
friend bool operator==(iterator const&, iterator const&);
#endif
};
bool c = iterator<false>{} == iterator<false>{} // error
|| iterator<false>{} == iterator<true>{}
|| iterator<true>{} == iterator<false>{}
|| iterator<true>{} == iterator<true>{};
}
// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
enum Color { Red };
struct C {
C();
C(Color);
operator Color() const;
bool operator==(Color) const;
friend bool operator==(C, C);
};
bool c = C{} == C{}; // error
bool d = C{} == Red;
}
Differential revision: https://reviews.llvm.org/D78938
Details: Jump Threading does not make sense for the targets with divergent CF
since they do not use branch prediction for speculative execution.
Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
This may cause errors in further CF lowering.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93302
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.
There wasn't a great test coverage for this, so i added extra, to be sure.
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.
This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.
Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:
PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);
Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings.
We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.
This adds support for loops like
unsigned clz(unsigned x) {
unsigned w = sizeof (x) * CHAR_BIT;
while (x) {
w--;
x >>= 1;
}
return w;
}
and
unsigned clz(unsigned x) {
unsigned w = sizeof (x) * CHAR_BIT - 1;
while (x >>= 1) {
w--;
}
return w;
}
To support these we look for add x, -1 as well as add x, 1 that
we already matched. If the value was -1 we need to subtract from
the initial counter value instead of adding to it.
Fixes PR48404.
Differential Revision: https://reviews.llvm.org/D92745
his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression.
Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so.
Differential Revision: https://reviews.llvm.org/D92066
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before rewrite (but after introduction of new
induction variables) use SCEV to compute an equivalent set of values for
each @llvm.dbg.value in the loop body (among the loop header PHI-nodes).
After rewrite (and dead PHI elimination) update those @llvm.dbg.value
now referencing undef by picking a remaining value from its equivalence
set. Allow match with offset by inserting compensation code in the
DIExpression.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.
Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.
As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.
This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.
Differential Revision: https://reviews.llvm.org/D91174
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.
Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass
is actually operating on IR level and does not use any code gen specific
passes. It is useful to move it into transforms directory so that it
can be more widely used as a mid-level transform as well (apart from
usage in codegen pipeline).
In particular, we have a usecase downstream where we would like to use
this pass in our mid-level pipeline which operates on IR level.
The next change will be to add support for new PM.
Reviewers: craig.topper, apilipenko, skatkov
Reviewed-By: skatkov
Differential Revision: https://reviews.llvm.org/D92407
Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed.
For example: int64_t EarlierSize = int64_t(Loc.Size.getValue());
Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative.
I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D92648
The CountPrev variable was only used to forward a value from
the if statement to the conditional operator under the same
condition.
While there move some variable declarations to their first
assignment.
IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017
in commit 9ac7021a25. The last use was
removed on Nov 20, 2017 in commit
268467869b.
Currently PassBuilder.cpp is by far the file that takes longest to
compile. This is due to tons of templates being instantiated per pass.
Follow PassManager by using wrappers around passes to avoid making
the adaptors templated on the pass type. This allows us to move various
adaptors' run methods into .cpp files.
This reduces the compile time of PassBuilder.cpp on my machine from 66
to 39 seconds. It also reduces the size of opt from 685M to 676M.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D92616
Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88286
Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch.
But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88285
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.
Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.
Differential Revision: https://reviews.llvm.org/D92608
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D92489
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This is a straightforward port of MemCpyOpt to MemorySSA following
the approach of D26739. MemDep queries are replaced with MSSA queries
without changing the overall structure of the pass. Some care has
to be taken to account for differences between these APIs
(MemDep also returns reads, MSSA doesn't).
Differential Revision: https://reviews.llvm.org/D89207
We were not correctly splitting a blocks for chains of length 1.
Before that change, additional instructions for blocks in chains of
length 1 were not split off from the block before removing (this was
done correctly for chains of longer size).
If this first block contained an instruction referenced elsewhere,
deleting the block, would result in invalidation of the produced value.
This caused a miscompile which motivated D92297 (before D17993,
nonnull and dereferenceable attributed were not added so MergeICmps were
not triggered.) The new test gep-references-bb.ll demonstrate the issue.
The regression was introduced in
rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2.
This supersedes D92364.
Test case by MaskRay (Fangrui Song).
Differential Revision: https://reviews.llvm.org/D92375
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).
This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.
The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.
Differential Revision: https://reviews.llvm.org/D91649
When bailing out in rewriteLoopExitValues() you could be left with PHI
nodes in the DeadInsts vector. Those would be not handled by the use of
RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This
resulted in the IndVarSimplify pass returning an incorrect modified
status. This was caught by the expensive check introduced in D86589.
This patches changes IndVarSimplify so that it deletes those PHI nodes,
using RecursivelyDeleteDeadPHINode().
This fixes PR47486.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D91153
LoopLoadElim may end up expanding an AddRec from a loop
which is not the current loop. This loop may not be in simplify
form. We figure it out after the no-return point, so cannot bail
in this case.
AddRec requires simplify form to expand. The only way to ensure
this does not crash is to simplify all loops beforehand.
The issue only exists in new PM. Old PM requests LoopSimplify
required pass and it simplifies all loops before the opt begins.
Differential Revision: https://reviews.llvm.org/D91525
Reviewed By: asbirlea, aeubanks
Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.
I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.
Differential Revision: https://reviews.llvm.org/D91690
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:
1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.
We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.
Let's now look at an example. Given the following LLVM IR:
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
br i1 %cmp, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
br label %bb3
bb3:
ret void
}
```
The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86490
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.
Use this new variant in the NPM -O0 pipeline.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91811
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.
Patched by: guopeilin
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D91325
This reverts commit 562addba65.
Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
As Wei Mi is reporting in post-commit review
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.
For now, simply exclude any such `or` that appears to be a root of
load widening idiom from the or->add transformation.
Note that the heuristic is greedy, it doesn't ensure that loads
can *actually* be widened into a single load.
This patch generalizes the extraction of a constraint for a given
condition. It allows decompose to return a vector of c * X pairs, which
allows de-composing multiple instructions in the future.
It also adds more clarifying comments.
In some cases we can handle IV and iter count of different types. It's a typical situation
after IV have been widened. This patch adds support for such cases, when legal.
Differential Revision: https://reviews.llvm.org/D88528
Reviewed By: skatkov
Support more instructions in SpeculativeExecution pass:
- ExtractElement
- InsertElement
- ShuffleVector
Differential Revision: https://reviews.llvm.org/D91633
I don't see any reason not to unconditionally retrieve TLI, it's fairly
cheap.
Fixes calls-errno.ll under NPM.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91476
When processing conditional branches, if the condition is an AND of 2 compares
and the true successor only has the current block as predecessor, queue both
conditions for the true successor.
When instructions are cloned from block BB to PredBB in the method
DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB
changes from 1 to number of successors of BB. So we have to copy
branch probabilities from BB to PredBB.
Reviewed By: Kazu Hirata
Differential Revision: https://reviews.llvm.org/D90841
- In certain cases, a generic pointer could be assumed as a pointer to
the global memory space or other spaces. With a dedicated target hook
to query that address space from a given value, infer-address-space
pass could infer and propagate that to all its users.
Differential Revision: https://reviews.llvm.org/D91121
When processing conditional branches, if the condition is an OR of 2 compares
and the false successor only has the current block as predecessor, queue both
negated conditions for the false successor
Widen the IV to the widest available and legal integer type, which makes this
transformations always safe so that we can skip overflow checks.
Motivation is to let this pass trigger on 64-bit targets too, and this is the
last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just
before IndVarSimplify so that IVs are not already widened, D90421 factors out
widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use
it in LoopFlatten.
Differential Revision: https://reviews.llvm.org/D90640
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.
Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.
This patch is one of the steps before we can safely re-apply D91017.
Differential Revision: https://reviews.llvm.org/D91511
In the last change to IRCE the BPI is ignored if BFI is present, however
BFI and BPI have a different thresholds. Specifically BPI approach checks only
latch exit probability so it is expected if the loop has only one exit block (latch)
the behavior with BFI and BPI should be the same,
BPI approach by default uses threshold 10, so it considers the loop with estimated
number of iterations less then 10 should not be considered for IRCE optimization.
BFI approach uses the default value 3 and this is inconsistent.
The CL modifies the code to use the same threshold for both approaches..
The test is updated due to it has two side-exits (except latch) and each of them has a
probability 1/16, so BFI estimates the number of runtime iteration is about to 7
(1/16 + 1/16 + some for latch) and test fails.
Reviewers: mkazantsev, ebrevnov
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D91230
This is used to test RemoveRedundantDbgInstrs(), which is used by other
passes.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D91477
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848