- UserParent = PN->getIncomingBlock(*I->use_begin());
+ UserParent = PN->getIncomingBlock(*SingleUse);
The first use of I may be droppable (llvm.assume).
When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang
with ThinLTO with minimized bitcode files, I see such a case in
the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE
clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3
Unfortunately it is really difficult to get a minimized reproduce.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type. If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.
Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.
In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function. This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.
Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.
Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73832
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.
Reviewers: nicholas, dblaikie, nlewycky
Subscribers: hiraditya, bmahjour, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75952
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.
Differential Revision: https://reviews.llvm.org/D76373
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.
[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D74888
Summary:
DivRemPairs is unsound with respect to undef values.
```
// bb1:
// %rem = srem %x, %y
// bb2:
// %div = sdiv %x, %y
// -->
// bb1:
// %div = sdiv %x, %y
// %mul = mul %div, %y
// %rem = sub %x, %mul
```
If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
%div = sdiv undef, 1 // %div = undef
%rem = srem undef, 1 // %rem = 0
=>
%div = sdiv undef, 1 // %div = undef
%mul = mul %div, 1 // %mul = undef
%rem = sub %x, %mul // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5
Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.
This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .
This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.
Reviewers: spatel, lebedev.ri, george.burgess.iv
Reviewed By: spatel
Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76483
coroutine frame
Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.
Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.
More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.
Differential Revision: https://reviews.llvm.org/D75664
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.
Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.
An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.
[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662
Differential Revision: https://reviews.llvm.org/D76645
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76058
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)
Differential Revision: https://reviews.llvm.org/D76540
We need to insert into the Visited set at the same time we insert
into the worklist. Otherwise we may end up pushing the same
instruction to the worklist multiple times, and only adding it to
the visited set later.
If ExpensiveCombines is enabled (which is the case with -O3 on the
legacy PM and always on the new PM), InstCombine tries to compute
the known bits of all instructions in the hope that all bits end up
being known, which is fairly expensive.
How effective is it? If we add some statistics on how often the
constant folding succeeds and how many KnownBits calculations are
performed and run test-suite we get:
"instcombine.NumConstPropKnownBits": 642,
"instcombine.NumConstPropKnownBitsComputed": 18744965,
In other words, we get one fold for every 30000 KnownBits calculations.
However, the truth is actually much worse: Currently, known bits are
computed before performing other folds, so there is a high chance
that cases that get folded by known bits would also have been
handled by other folds.
What happens if we compute known bits after all other folds
(hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?
"instcombine.NumConstPropKnownBits": 0,
"instcombine.NumConstPropKnownBitsComputed": 18105547,
So it turns out despite doing 18 million known bits calculations,
the known bits fold does not do anything useful on test-suite.
I was originally planning to move this into AggressiveInstCombine
so it only runs once in the pipeline, but seeing this, I think
we're better off removing it entirely.
As this is the only use of the "expensive combines" mechanism,
it may be removed afterwards, but I'll leave that to a separate patch.
Differential Revision: https://reviews.llvm.org/D75801
Ideally SimplifyDemanded should compute the same known bits as
computeKnownBits(). This patch addresses one discrepancy, where
ValueTracking is more powerful: If we have a shl nsw shift, we
know that the sign bit of the input and output must be the same.
If this results in a conflict, the result is poison.
This is implemented in
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)
and
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908).
This implements the same basic logic in SimplifyDemanded. It's
slightly stronger, because I return undef instead of zero for the
poison case (which is not an option inside ValueTracking).
As mentioned in https://reviews.llvm.org/D75801#inline-698484,
we could detect poison in more cases, this just establishes parity
with the existing logic.
Differential Revision: https://reviews.llvm.org/D76489
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.
This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.
Differential Revision: https://reviews.llvm.org/D75815
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.
Differential Revision: https://reviews.llvm.org/D76457
This patch sets the stage for supporting both row and column major
layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds
booleans indicating the underlying layout to both MatrixTy and ShapeInfo
and generalizes the methods of MatrixTy to support both row and column
major layouts.
Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76324
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.
To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.
This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration. For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72148
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.
Differential Revision: https://reviews.llvm.org/D76345
This logic can be shared with the tiled code generation.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75565
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on. Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.
Consider testcase PR44611-across-header-hang.ll in this patch. The
first opportunity to thread through two basic blocks is:
from bb_body2 through bb_header and bb_body1 to bb_body2.
The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1. Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:
from bb_header.thread1 through bb_header and bb_body1 to bb_body2.
After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2. In other words, we keep
peeling one iteration from bb_header's self loop.
The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.
Reviewers: wmi, junparser, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76390
This patch slightly generalizes the code to emit loads and stores of a
matrix and adds helpers to load/store a tile of a larger matrix.
This will be used in a follow-up patch introducing initial tiling.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75564
If we know the SSE shift amount is out of range then we can simplify to zero value (logical) or a 'signsplat' bitwidth-1 shift (arithmetic). This allows us to remove the equivalent ConstantInt constant folding path from simplifyX86immShift.
The slli/srli/srai 'immediate' vector shifts (although its not immediate anymore to match gcc) can be replaced with generic shifts if the shift amount is known to be in range.
For PHIs with multiple incoming values, we can improve precision by
using constant ranges for integers. We can over-approximate phis
by merging the incoming values.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71933
If one of the operands of a binary operator is a constant range, we can
use ConstantRange::binaryOp to approximate the result.
We still handle single element constant ranges as we did previously,
with ConstantExpr::get(), because ConstantRange::binaryOp still gives
worse results in a few cases for single element ranges.
Also note that we bail out early if any of the operands is still unknown.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71936
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where scalable
property doesn't matter (check for zero-sized alloca), we should explicitly
call getKnownMinSize() to avoid implicit type conversion to uint64_t, which is
invalid for scalable vector type.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76386
The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76375
Now that printing VPValues uses the underlying IR value name, if
available, recording the underlying value here improves printing.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76374
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
When the an underlying value is available, we can use its name for
printing, as discussed in D73078.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76200
For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71935
Functions include their arguments in the use-list. Changed function
values mean that the result of the function changed. We only need
to update the call sites with the new function result and do not
have to propagate the call arguments.
To do so, this patch splits up the visitCallSite into handleCallResult
and handleCallArguments and updates markUsersAsChanged to only update
call results for functions.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75846
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.
Reviewers: jdoerfert, nikic, lebedev.ri
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76147
According to LangRef:
If len is not a positive integer multiple of element_size, then the behaviour of the intrinsic is undefined.
Add InstCombine rule to transform intrinsic to undef operation.
This is a follow-up for D76116.
Reviewers: reames
Reviewed By: reames
Subscribers: hiraditya, jfb, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D76215
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
a tail call
This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
Check that HasSafePathToCall is true before checking the call is a tail
call.
Original commit message:
Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:
```
%call = call i8* @something(...)
%2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
%3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
ret i8* %3
```
Fix the bug by checking whether @something is a tail call.
rdar://problem/59275894
This reverts commit 5c3117b0a9
This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.
I happened to notice this code because the FIXME talks about
OrderedBasicBlock.
Reviewed By: fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D76075
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().
Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.
This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.
Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76017
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75825
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.
Reviewed By: jdoerfert, uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D74691
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
check only uses possibly executed before this callsite.
Propagates caller argument's noalias attribute to callee.
Reviewed by: jdoerfert, uenoku
Reviewers: jdoerfert, sstefan1, uenoku
Subscribers: uenoku, sstefan1, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D71617
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.
This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used
* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D60582
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.
Both these fact may be not provable. It is shown in the provided test:
Could not prove: `{-294,+,-2}<%bb1> != 0`
Asserting: `{-294,+,-2}<%bb1> == 0`
Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.
Instead of assert, we should be checking the required conditions explicitly.
Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76050
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.
To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.
With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.
load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
Matrix<Ty, R, C> Result;
Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
return Result;
}
store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
*reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}
toplevel.cpp
void test(double *A, double *B, double *C) {
store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}
For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D73600
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.
C11 7.21.5.2 The fflush function
> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.
i.e. fopen'ed FILE* is inherently captured.
POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking
> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.
After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.
C11 7.22.4.4 The exit function
> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.
The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.
dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization. f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.
This can produce exploitable conditions depending on libc internals.
Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros. Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark. Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.
The function attribute inference is still useful and thus kept.
Reviewed By: xbolva00
Differential Revision: https://reviews.llvm.org/D75933
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75467
SimplifyAddWithRemainder currently also matches for vector types, but
tries to create an integer constant, which causes a crash.
By using Constant::getIntegerValue() we can support both the scalar and
vector cases.
The 2 added test cases crash without the fix.
Reviewers: spatel, lebedev.ri
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D75906
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.
CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.
Review: Craig Topper
Differential Revision: https://reviews.llvm.org/D75620
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.
For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.
[1] https://developer.apple.com/documentation/accelerate
Reviewers: ABataev, RKSimon, spatel
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D75878
a tail call
Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:
```
%call = call i8* @something(...)
%2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
%3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
ret i8* %3
```
Fix the bug by checking whether @something is a tail call.
rdar://problem/59275894
When simplifying a call without uses, replaceInstUsesWith() is
going to do nothing, but we'll skip all following folds. We can
only run into this problem with calls that both simplify and are
not trivially dead if unused, which currently seems to happen only
with calls to undef, as the test diff shows. When extending
SimplifyCall() to handle "returned" attributes, this becomes a much
bigger problem, so I'm fixing this first.
Differential Revision: https://reviews.llvm.org/D75814
This patch introduces the propagation of known information based on path exploration.
For example,
```
int u(int c, int *p){
if(c) {
return *p;
} else {
return *p + 1;
}
}
```
An argument `p` is dereferenced whatever c's value is.
For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D65593
opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1
The first part of this patch generalizes the cost calculation to accept
different extraction indexes. The second part creates a shuffle+extract
before feeding into the existing code to create a vector op+extract.
The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc"
rather than "TargetTransformInfo::SK_Broadcast" (splat specifically
from element 0) because we do not have a more general "SK_Splat"
currently. That does not affect any of the current regression tests,
but we might be able to find some cost model target specialization where
that comes into play.
I suspect that we can expose some missing x86 horizontal op codegen with
this transform, so I'm speculatively adding a debug flag to disable the
binop variant of this transform to allow easier testing.
The test changes show that we're sensitive to cost model diffs (as we
should be), so that means that patches like D74976
should have better coverage.
Differential Revision: https://reviews.llvm.org/D75689
Summary:
Assume bundles need to be usable by Analysis and Transforms/Utils isn't.
so this commit moves utilities to deal with asusme bundles to IR.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75618
Summary: Finding what information is know about a value from a use is generally useful and can be done quickly.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75616
Fixes a regression from D75801. SimplifyDemandedUseBits() is also
supposed to compute the known bits (of the demanded subset) of the
instruction. For unknown instructions it does so by directly calling
computeKnownBits(). For known instructions it will compute known
bits itself. However, for instructions where only some cases are
handled directly (e.g. a constant shift amount) the known bits
invocation for the unhandled case is sometimes missing. This patch
adds the missing calls and thus removes the main discrepancy with
ExpensiveCombines mode.
Differential Revision: https://reviews.llvm.org/D75804
It is possible that an instruction to be changed to unreachable is
in the same block with a terminator that can be constant-folded.
In this case, as of now, the instruction will be changed to
unreachable before the terminator is folded. But, then the
whole BB becomes invalidated and so when we go ahead to fold
the terminator, we trap.
Change the order of these two.
Differential Revision: https://reviews.llvm.org/D75780
With the addition of the LLD time tracing it made sense to include coverage
for LLVM's various passes. Doing so ensures that ThinLTO is also covered
with a time trace.
Before:
{F11333974}
After:
{F11333928}
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D74516
As mentioned in the comments, extractelement is special
since we actually want a scalar base for that element we extracted from
the vector (i.e. not a vector base).
This same logic should apply to uses of the extractelement such as phis
and selects which have the same BDV as the extractelement.
Howeber, for these uses we conservatively mark the BDV state as
conflict, since setting the EE's new base BDV does not always dominate
these uses.
Added testcase showcases the problem where the BDV identification chokes
on the incorrect cast from vector to scalar for the phi use of
extractelement.
Tests-Run: make check, internal fuzzer testing
Reviewers: reames, skatkov, dantrushin
Reviewed-By: dantrushin
Differential Revision: https://reviews.llvm.org/D75704
If we infer the dso_local flag for -fpic, dso_local should be dropped
when we convert a GlobalVariable a declaration. dso_local causes the
generation of direct access (e.g. R_X86_64_PC32). Such relocations referencing
STB_GLOBAL STV_DEFAULT objects are not allowed in a -shared link.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74749
Summary:
The widenIVUse avoids generating trunc by evaluating the use as AddRec, this
will not work when:
1) SCEV traces back to an instruction inside the loop that SCEV can not
expand, eg. add %indvar, (load %addr)
2) SCEV finds a loop variant, eg. add %indvar, %loopvariant
While SCEV fails to avoid trunc, we can still try to use instruction
combining approach to prove trunc is not required. This can be further
extended with other instruction combining checks, but for now we handle the
following case (sub can be "add" and "mul", "nsw + sext" can be "nus + zext")
```
Src:
%c = sub nsw %b, %indvar
%d = sext %c to i64
Dst:
%indvar.ext1 = sext %indvar to i64
%m = sext %b to i64
%d = sub nsw i64 %m, %indvar.ext1
```
Therefore, as long as the result of add/sub/mul is extended to wide type with
right extension and overflow wrap combination, no
trunc is required regardless of how %b is generated. This pattern is common
when calculating address in 64 bit architecture.
Note that this patch reuse almost all the code from D49151 by @az:
https://reviews.llvm.org/D49151
It extends it by providing proof of why trunc is unnecessary in more general case,
it should also resolve some of the concerns from the following discussion with @reames.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180910/585945.html
Reviewers: sanjoy, efriedma, sebpop, reames, az, javed.absar, amehsan
Reviewed By: az, amehsan
Subscribers: hiraditya, llvm-commits, amehsan, reames, az
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73059
Summary:
This performs better for sample PGO.
NFC as PGSOColdCodeOnlyForSamplePGO is still true.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75550
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.
This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73078
After structurization, some phi nodes can have a single incoming edge
and can be simplified away. This change runs a simplify query on all
phis that are either modified or added by the structurizer. This also
moves some phis closer to their use as a side benefit.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D75500
As the test case shows if there is an ExtractValueInst in the Ret block, function dupRetToEnableTailCallOpts can't duplicate it into the block containing call. So later no tail call is generated in CodeGen.
This patch adds the ExtractValueInst handling code in function dupRetToEnableTailCallOpts and FoldReturnIntoUncondBranch, and later tail call can be generated for this case.
Differential Revision: https://reviews.llvm.org/D74242
This teaches Loop Strength Reduction the details about masked load and
store address operands, so that it can have a better time optimising
them as it would for normal loads and stores.
Differential Revision: https://reviews.llvm.org/D75371
This makes sure that the constant expression bitcast goes through
target-dependent constant folding, and thus avoids an additional
iteration of InstCombine.
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs:
https://bugs.llvm.org/show_bug.cgi?id=45015https://bugs.llvm.org/show_bug.cgi?id=42022
This patch contains a few independent changes:
1. Move the pass up in the pipeline, so it happens just after loop-vectorization.
This is only to keep vectorization passes together in the pipeline at the moment.
I don't have evidence of interaction between these yet.
2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was
partly proposed as far back as rL219644 (which is why it's effectively being moved
in the old PM code). This is important because the subsequent -instcombine doesn't
work as well without EarlyCSE. With the CSE, -instcombine is able to squash
shuffles together in 1 of the tests (because those are simple "select" shuffles).
3. Remove the -vector-combine pass that was running after SLP. We may want to do that
eventually, but I don't have a test case to support it yet.
Differential Revision: https://reviews.llvm.org/D75145
Summary:
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f
is an example of a small C++ program that uses C++20 coroutines that
is difficult to debug, due to the loss of debug info for variables that
"spill" across coroutine suspension boundaries. This patch addresses
that issue by inserting 'llvm.dbg.declare' intrinsics that point the
debugger to the variables' location at an offset to the coroutine frame.
With this patch, I confirmed that running the 'frame variable' commands in
https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f at
the specified breakpoints results in the correct values being printed
for coroutine frame variables 'i' and 'j' when using an lldb built from
trunk, as well as with gdb 8.3 (lldb 9.0.1, however, could not print the
values). The added test case also verifies this improved behavior.
The existing coro-debug.ll test case is also modified to reflect the
locations at which Clang actually places calls to 'dbg.declare', and
additional checks are added to ensure this patch works as intended in that
example as well.
Reviewers: vsk, jmorse, GorNishanov, lewissbaker, wenlei
Subscribers: EricWF, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75338
Summary: This patch adds a new way to query operand bundles of an llvm.assume that is much better suited to some users like the Attributor that need to do many queries on the operand bundles of llvm.assume. Some modifications of the IR like replaceAllUsesWith can cause information in the map to be outdated, so this API is more suited to analysis passes and passes that don't make modification that could invalidate the map.
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75020
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.
VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D74445
Also adds a force-reduction-intrinsics option for testing, for forcing
the generation of reduction intrinsics even when the backend is not
requesting them.
Summary:
This is to avoid generating duplicate llvm.dbg.value instrinsic if it already exists after the Instruction.
Before inserting llvm.dbg.value instruction, LLVM checks if the same instruction is already present before the instruction to avoid duplicates.
Currently it misses to check if it already exists after the instruction.
flang generates IR like this.
%4 = load i32, i32* %i1_311, align 4, !dbg !42
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
When this IR is processed in llvm, it ends up inserting duplicates.
%4 = load i32, i32* %i1_311, align 4, !dbg !42
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33
We have now updated LdStHasDebugValue to include the cases when instruction is already
followed by same dbg.value instruction we intend to insert.
Now,
Definition and usage of function LdStHasDebugValue are deleted.
RemoveRedundantDbgInstrs is called for the cleanup of duplicate dbg.value's
Testing:
Added unit test for validation
check-llvm
check-debuginfo (the debug info integration tests)
Reviewers: aprantl, probinson, dblaikie, jmorse, jini.susan.george
SouraVX, awpandey, dstenb, vsk
Reviewed By: aprantl, jmorse, dstenb, vsk
Differential Revision: https://reviews.llvm.org/D74030
One of the checks has been removed as it seem invalid.
The LoopStep size is always almost a 32-bit.
Differential Revision: https://reviews.llvm.org/D75079
This reverts commit 80d0a137a5, and the
follow on fix in 873c0d0786. It is
causing test failures after a multi-stage clang bootstrap. See
discussion on D73242 and D75201.
Summary:
Fixes an issue that cropped up after the changes in D73242 to delay
the lowering of type tests. LTT couldn't handle any type tests with
non-string type id (which happens for local vtables, which we try to
promote during the compile step but cannot always when there are no
exported symbols).
We can simply treat the same as having an Unknown resolution, which
delays their lowering, still allowing such type tests to be used in
subsequent optimization (e.g. planned usage during ICP). The final
lowering which simply removes these handles them fine.
Beefed up an existing ThinLTO test for such unpromoted type ids so that
the internal vtable isn't removed before lower type tests, which hides
the problem.
Reviewers: evgeny777, pcc
Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, aganea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75201
Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.
Reviewers: fhahn, lebedev.ri, xbolva00
Reviewed By: xbolva00
Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D70304
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.
Differential Revision: https://reviews.llvm.org/D75448
getFirstInsertionPt's return value must be checked for validity before
casting it to Instruction*. Don't attempt to insert casts after a phi in
a catchswitch block.
Fixes PR45033, introduced in D37832.
Reviewed By: davidxl, hfinkel
Differential Revision: https://reviews.llvm.org/D75381
Summary:
This patch defines two freeze instructions to have the same value number if they are equivalent.
This is allowed because GVN replaces all uses of a duplicated instruction with another.
If it partially rewrites use, it is not allowed. e.g)
```
a = freeze(x)
b = freeze(x)
use(a)
use(a)
use(b)
=>
use(a)
use(b) // This is not allowed!
use(b)
```
Reviewers: fhahn, reames, spatel, efriedma
Reviewed By: fhahn
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75398
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.
Differential Revision: https://reviews.llvm.org/D75162
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Summary:
When -dfsan-event-callbacks is specified, insert a call to
__dfsan_mem_transfer_callback on every memcpy and memmove.
Reviewers: vitalybuka, kcc, pcc
Reviewed By: kcc
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75386
This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
...
TE->setOperand(i, Operands);
buildTree_rec(Operands, Depth + 1,...);
}
Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.
Differential Revision: https://reviews.llvm.org/D75296
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
This is a second commit after a revert
https://reviews.llvm.org/rG4569b3a86f8a4b1b8ad28fe2321f936f9d7ffd43 and a fix
https://reviews.llvm.org/rG41e06ae7ba91.
Differential Revision: https://reviews.llvm.org/D69591
Summary: This fixes the crash that led to the revert of D69591.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75307
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.
Differential Revision: https://reviews.llvm.org/D75276
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.
The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.
Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.
Reviewers: hfinkel, gchatelet, chandlerc, davidxl
Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74162
Summary:
This patch makes EarlyCSE fold equivalent freeze instructions.
Another optimization that I think will be useful is to remove freeze if its operand is used as a branch condition or at llvm.assume:
```
%c = ...
br i1 %c, label %A, ..
A:
%d = freeze %c ; %d can be optimized to %c because %c cannot be poison or undef (or 'br %c' would be UB otherwise)
```
If it make sense for EarlyCSE to support this as well, I will make a patch for this.
Reviewers: spatel, reames, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75334
SROA will drop the explicit alignment on allocas when the ABI guarantees
enough alignment. Because the alignment on new load/store instructions
are set based on the alloca's alignment, that means SROA would end up
dropping the alignment from atomic loads and stores, which is not
allowed (see bug). For those, make sure to always carry over the
alignment from the previous instruction.
Differential revision: https://reviews.llvm.org/D75266
Summary:
For now just insert the callback for stores, similar to how MSan tracks
origins. In the future we may want to add callbacks for loads, memcpy,
function calls, CMPs, etc.
Reviewers: pcc, vitalybuka, kcc, eugenis
Reviewed By: vitalybuka, kcc, eugenis
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits, kcc
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75312
DSE would mistakenly remove store (2):
a = calloc(n+1)
for (int i = 0; i < n; i++) {
store 1, a[i+1] // (1)
store 0, a[i] // (2)
}
The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.
Reviewed By: efriedma, bjope
Differential Revision: https://reviews.llvm.org/D68006
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.
To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.
This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.
This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.
Differential Revision: https://reviews.llvm.org/D75008
Use UnaryOperator::CreateFNeg instead.
Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75130
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.
This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).
Differential revision: https://reviews.llvm.org/D74944
As suggested in D75145 -
I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.
We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.
Differential Revision: https://reviews.llvm.org/D75204
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)
Differential Revision: https://reviews.llvm.org/D72169
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.
The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.
Differential Revision: https://reviews.llvm.org/D75011
DevirtSCCRepeatedPass iteration. Needs ReviewPublic
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
See discussion on PR44792.
This reverts commit 02ce9d8ef5.
It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.
We will now transform the following:
while (...) {
if (C) { A }
else { B }
}
Into:
C' = freeze(C)
if (C') {
while (...) { A }
} else {
while (...) { B }
}
This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652
Reviewers: reames, majnemer, chenli, sanjoy, hfinkel
Reviewed By: reames
Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D29015
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74925
body
We started seeing cases where ARC optimizer would move retain calls into
loop bodies, causing imbalance in the number of retain and release
calls, after changes were made to delete inert ARC calls since the inert
calls that used to block code motion are gone.
Fix the bug by setting the CFG hazard flag when visiting a loop header.
rdar://problem/56908836
Summary:
Replacing uses of IV outside of the loop is likely generally useful,
but `rewriteLoopExitValues()` is cautious, and if it isn't told to always
perform the replacement, and there are hard uses of IV in loop,
it doesn't replace.
In [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]],
that prevents `-indvars` from replacing uses of induction variable
after the loop, which might be one of the optimization failures
preventing that code from being vectorized.
Instead, now that the cost model is fixed, i believe we should be
a little bit more optimistic, and also perform replacement
if we believe it is within our budget.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]].
Reviewers: reames, mkazantsev, asbirlea, fhahn, skatkov
Reviewed By: mkazantsev
Subscribers: nikic, hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73501
Summary:
In future patches`SCEVExpander::isHighCostExpansionHelper()` will respect the budget allocated by performing TTI cost modelling.
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73705
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
This reverts commit 8d22100f66.
There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996). I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.
We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.
We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.
AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).
Differential Revision: https://reviews.llvm.org/D75062
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74953
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757
Summary:
Blocks in a loop can be in any order as long as the loop header is the
first block in Blocks.
With some order of Blocks, cloneLoopWithPreheader would trigger the
assertion in addBasicBlockToLoop.
Example:
define void @test(i64 %N) {
preheader.i:
br label %header.i
header.i:
%i = phi i64 [ 0, %preheader.i ], [ %inc.i, %latch.i ]
br label %header.j
header.j:
%j = phi i64 [ 0, %header.i ], [ %inc.j, %latch.j ]
br label %header.k
header.k:
%k = phi i64 [ 0, %header.j ], [ %inc.k, %latch.k ]
call void @baz(i64 %i, i64 %j, i64 %k)
br label %latch.k
latch.k:
%inc.k = add nsw i64 %k, 1
%cmp.k = icmp slt i64 %inc.k, %N
br i1 %cmp.k, label %header.k, label %latch.j
latch.j:
%inc.j = add nsw i64 %j, 1
%cmp.j = icmp slt i64 %inc.j, %N
br i1 %cmp.j, label %header.j, label %latch.i
latch.i:
%inc.i = add nsw i64 %i, 1
%cmp.i = icmp slt i64 %inc.i, %N
br i1 %cmp.i, label %header.i, label %exit.i
exit.i:
ret void
}
declare void @baz(i64, i64, i64)
If the blocks of loop-i is in the order: header.i, latch.k, header.k,
header.j, latch.j, latch.i,
then cloneLoopWithPreheader would trigger the assertion in
addBasicBlockToLoop
assert(contains(SameHeader) && getHeader() == SameHeader->getHeader() &&
"Incorrect LI specified for this loop!");
As latch.k is in both loop-j and loop-k, it would be set as the header
of both loops after adding latch.k.
If we update loop headers during cloning blocks, then after adding
header.k,
the header of loop-k would be updated with header.k,
while the header of loop-j stays as latch.k.
When adding header.j, SameHeader is loop-k, SameHeader->getHeader() is
header.k, but getHeader() is latch.k, which trigger the assertion.
Reviewer: jdoerfert, Meinersbur, fhahn, kbarton, hfinkel, bmahjour,
etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D74382
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.
In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.
This drops one iteration from the annotated test case.
Followup to D73919 with another batch of replacements of
setOperand() -> replaceOperand(), to make sure the old
operand gets DCEd right away.
Differential Revision: https://reviews.llvm.org/D74932
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.
Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D74959
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.
To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.
Differential Revision: https://reviews.llvm.org/D74792
Can be used like
-debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72147
For tracked globals that are unknown after solving, we expect all
non-store uses to be replaced.
This is a follow-up to f8045b250d, which removed forcedconstant.
We should not mark unknown loads as overdefined, as they either load
from an unknown pointer or an undef global. Restore the original logic
for loads.
Summary:
After updating cost model in AMDGPU target (47a5c36b37) the pass started to
ignore some BBs since they got all instructions estimated as free.
Reviewers: arsenm, chandlerc, nhaehnle
Reviewed By: nhaehnle
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74825
We can look through calls with `returned` argument attributes when we
collect subsuming positions. This allows us to get existing attributes
from more places.
We are often interested in an assumed constant and sometimes it has to
be an integer constant. Before we only looked for the latter, now we can
ask for either.
If we propagate function pointers across function boundaries we can
create new call edges. These need to be represented in the CG if we run
as a CGSCC pass. In the new pass manager that is currently not handled
by the CallGraphUpdater so we need to prevent the situation for now.
We usually will ask for liveness of an argument anyway so we ended up
lazily creating the attribute anyway. However, that is not always the
case and even if it is we should go the eager route here. Various tests
show how this can improve the outcome. One test exposed a problem with
type mismatches between argument and call site argument, a fix is
included. For liveness various more tests were added as well.
If a function pointer is casted into a different type the resulting
expression can be a constant. If so, it can be used multiple times which
cannot be handled by the AbstractCallSite constructor alone. Instead, we
follow the cast expression uses now explicitly during the call site
traversal.
In builds with assertions enabled (!NDEBUG), IndVarSimplify does an
additional query to ScalarEvolution which may change future SCEV queries
since it fills the internal cache differently. The result is actually
only used with the -verify-indvars command line option. We fix the issue
by only calling SE->getBackedgeTakenCount(L) if -verify-indvars is
enabled such that only -verify-indvars shows the behavior, but not debug
builds themselves. Also add a remark to the description of
-verify-indvars about this behavior.
Fixes llvm.org/PR44815
Differential Revision: https://reviews.llvm.org/D74810
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
Summary:
Depends on https://reviews.llvm.org/D71900.
The fourth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-cleanup'.
No existing regression tests check the behavior of coro-cleanup on its
own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but
it relies on the entire coroutines pipeline being run. It's updated to
test the new pass manager in the 5th patch of this series.)
Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71901
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)
ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.
AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
ValueMapping);
We can also allow unconditional PredBB, but the produced code is not
better.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D74747
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.
The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.
Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D74758
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.
Differential Revision: https://reviews.llvm.org/D72944
The select-of-cttz transform can currently duplicate cttz intrinsics
and zext/trunc ops. The cause is that it unnecessarily duplicates
the intrinsic and the zext/trunc when setting the "undef_on_zero"
flag to false. However, it's always legal to set the flag from true
to false, so we can make this replacement even if there are extra users.
Differential Revision: https://reviews.llvm.org/D74685
Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have
a fold that converts icmp (and (ashr X, C3), C2), C1 into
icmp (and C2'), C1', but it imposed overly strict requirements on the
transform.
Relax this by checking that both C2 and C1 don't shift out bits
(in a signed sense) when forming the new constants.
Alive proofs (https://rise4fun.com/Alive/PTz0):
Name: ashr_legal
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp i16 %b, C1
=>
%d = and i16 %x, C2 << C3
%c = icmp i16 %d, C1 << C3
Name: ashr_shiftout_eq
Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1
%a = ashr i16 %x, C3
%b = and i16 %a, C2
%c = icmp eq i16 %b, C1
=>
%c = false
Note that >> corresponds to ashr here. The case of an equality
comparison has some special handling in this transform, because
it will form to a true/false result if the condition on the comparison
constant it violated.
Differential Revision: https://reviews.llvm.org/D74294
This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.
Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.
bool test(unsigned long long v, unsigned long long *res) {
return __builtin_mul_overflow(v, -4775807LL, res);
}
```
----------------------------------------
Name: D74141
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%overflow = extractvalue {i8, i1} %res, 1
%cmp = icmp ne %mul, 0
%ret = or i1 %overflow, %cmp
ret i1 %ret
=>
%t0 = icmp ne i8 %a, 0
%t1 = icmp ne i8 %b, 0
%ret = and i1 %t0, %t1
ret i1 %ret
%res = umul_overflow {i8, i1} %a, %b
%mul = extractvalue {i8, i1} %res, 0
%cmp = icmp ne %mul, 0
%overflow = extractvalue {i8, i1} %res, 1
Done: 1
Optimization is correct!
```
Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74141
Summary:
Depends on https://reviews.llvm.org/D71899.
The third in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements 'coro-elide'.
The new pass manager infrastructure does not implicitly repeat CGSCC
pass pipelines when a function is devirtualized, and so the tests
for the new pass manager that rely on that behavior now explicitly
specify `repeat<2>`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71900
Summary:
This patch has four dependencies:
1. The first in this series of patches that implement coroutine passes in the
new pass manager: https://reviews.llvm.org/D71898.
2. A patch that introduces an API for CGSCC passes to add new reference
edges to a `LazyCallGraph`, `updateCGAndAnalysisManagerForCGSCCPass`:
https://reviews.llvm.org/D72025.
3. A patch that introduces a `CallGraphUpdater` helper class that is
capable of mutating internal `LazyCallGraph` state in order to insert
new function nodes into a specific SCC: https://reviews.llvm.org/D70927.
4. And finally, a small edge case fix for updating `LazyCallGraph` that
patch 3 above happens to run into: https://reviews.llvm.org/D72226.
This is the second in a series of patches that ports the LLVM coroutines
passes to the new pass manager infrastructure. This patch implements
'coro-split'.
Some notes:
* Using the new CGSCC pass manager resulted in IR being printed in the
reverse order in some tests. To prevent FileCheck checks from failing due
to these reversed orders, this patch splits up test files that test
multiple different coroutine functions: specifically
coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll.
* CoroSplit.cpp contained 2 overloads of `splitCoroutine`, one of which
dispatched to the other based on the coroutine ABI being used (C++20
switch-based versus Swift returned-continuation-based). I found this
confusing, especially with the additional branching based on `CallGraph`
vs. `LazyCallGraph`, so I removed the ABI-checking overload of
`splitCoroutine`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, qcolombet, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71899
Summary:
The first in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-early'.
NB: All coroutines passes begin by checking that coroutine intrinsics are
declared within the LLVM IR module they're operating on. To do so, they call
`coro::declaresIntrinsics`. The next 3 patches in this series, which add new
pass manager implementations of the 'coro-split', 'coro-elide', and
'coro-cleanup' passes, use a similar pattern as the one used here: a static
function is shared across both old and new passes to detect if relevant
coroutine intrinsics are delcared. To make this pattern easier to read, this
patch adds `const` keywords to the parameters of `coro::declaresIntrinsics`.
Reviewers: GorNishanov, lewissbaker, junparser, chandlerc, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: ychen, wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71898
Relative to the original commit, this fixes some warnings,
and is based on the deletion of the IRBuilder copy constructor
in D74693. The automatic copy constructor would no longer be
safe.
-----
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
D73835 will make IRBuilder no longer trivially copyable. This patch
deletes the copy constructor in advance, to separate out the breakage.
Currently, the IRBuilder copy constructor is usually used by accident,
not by intention. In rG7c362b25d7a9 I've fixed a number of cases where
functions accepted IRBuilder rather than IRBuilder &, thus performing
an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an
IRBuilder was copied, while an InsertPointGuard should have been used
instead.
The only non-trivial use of the copy constructor is the
getIRBForDbgInsertion() helper, for which I separated construction and
setting of the insertion point in this patch.
Differential Revision: https://reviews.llvm.org/D74693
getOperationCost() is not the cost we wanted; that's not the
throughput value that the rest of the calculation uses.
We may want to switch everything in this code to use the
getInstructionThroughput() wrapper to avoid these kinds of
problems, but I'll look at that as a follow-up because that
can create other logical diffs via using optional parameters
(we'd need to speculatively create the vector instruction to
make a fair(er) comparison).
Rather than mixing creation of new instructions and in-place
modification here, create a new log2 intrinsic. This should be
NFC apart from worklist order changes.
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
This includes a fix for cases where things get marked as overdefined in
ResolvedUndefsIn, but we later discover a constant. To avoid crashing,
we consistently bail out on overdefined values in the visitors. This is
similar to the previous behavior with forcedconstant.
This reverts the revert commit 02b72f564c.
In addition to a single bit per memory locations, e.g., globals and
arguments, we now collect more information about the actual accesses,
e.g., what instruction caused it, was it a read/write/read+write, and
what the underlying base pointer was. Follow up patches will make
explicit use of this.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73527
While the function return updateImpl did only look at call sites the
manifest method looked at return values. If we don't do this during the
updateImpl we might create new abstract attributes during manifest. This
is a problem when it comes to liveness information.
This caused an error when passes iterated over cached assumptions in the
tracker and assumed them to be `null` or an instruction. I failed to
create a test case so far.
In addition to memory behavior attributes (readonly/writeonly) we now
derive memory location attributes (argmemonly/inaccessiblememonly/...).
The former is part of AAMemoryBehavior and the latter part of
AAMemoryLocation. While they are similar in nature it got messy when
they were put in a single AA. Location attributes for arguments and
floating values will follow later.
Note that both memory attributes kinds can derive readnone. If there are
no accesses AAMemoryBehavior will derive readnone. If there are accesses
but only to stack (=local) locations AAMemoryLocation will derive
readnone.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73426
Due to the genericValueTraversal we might visit values for which we did
not create an AAValueConstantRange object, e.g., as they are behind a
PHI or select or call with `returned` argument. As a consequence we need
to validate the types as we are about to query AAValueConstantRange for
operands.
Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408
In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.
This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.
Reviewers: dmgreen, fedor.sergeev, nikic
Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74574
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C
This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.
We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).
Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.
Differential Revision: https://reviews.llvm.org/D74495
This reverts commit 61b35e4111.
This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.
For more control we enforce that all liveness queries go through the
Attributor.
Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73313
If we have a replacement for a value, via AAValueSimplify, the original
value will lose all its uses. Thus, as long as a value is simplified we
can skip the uses in checkForAllUses, given that these uses are
transitive uses for the simplified version and will therefore affect the
simplified version as necessary.
Since this allowed us to remove calls without side-effects and a known
return value, we need to make sure not to eliminate `musttail` calls.
Those we keep around, or later remove the entire `musttail` call chain.
We relied on wouldInstructionBeTriviallyDead before but that functions
does not take assumed information, especially for calls, into account.
The replacement, AAIsDead::isAssumeSideEffectFree, does.
This change makes AAIsDeadCallSiteReturn more complex as we can have
a dead call or only dead users.
The test have been modified to include a side effect where there was
none in order to keep the coverage.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73311
Various parts of the LLVM code generator assume that the address
argument of a dbg.declare is not a `ptrtoint`-of-alloca. ASan breaks
this assumption, and this results in local variables sometimes being
unavailable at -O0.
GlobalISel, SelectionDAG, and FastISel all do not appear to expect
dbg.declares to have a `ptrtoint` as an operand. This means that they do
not place entry block allocas in the usual side table reserved for local
variables available in the whole function scope. This isn't always a
problem, as LLVM can try to lower the dbg.declare to a DBG_VALUE, but
those DBG_VALUEs can get dropped for all the usual reasons DBG_VALUEs
get dropped. In the ObjC test case I'm looking at, the cause happens to
be that `replaceDbgDeclare` has hoisted dbg.declares into the entry
block, causing LiveDebugValues to "kill" the DBG_VALUEs because the
lexical dominance check fails.
To address this, I propose:
1) Have ASan (always) pass an alloca to dbg.declares (this patch). This
is a narrow bugfix for -O0 debugging.
2) Make replaceDbgDeclare not move dbg.declares around. This should be a
generic improvement for optimized debug info, as it would prevent the
lexical dominance check in LiveDebugValues from killing as many
variables.
This means reverting llvm/r227544, which fixed an assertion failure
(llvm.org/PR22386) but no longer seems to be necessary. I was able to
complete a stage2 build with the revert in place.
rdar://54688991
Differential Revision: https://reviews.llvm.org/D74369
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
This version includes a fix for a set of crashes caused by marking
values depending on a yet unknown & tracked call as overdefined.
In some cases, we would later discover that the call has a constant
result and try to mark a user of it as constant, although it was already
marked as overdefined. Most instruction handlers bail out early if the
instruction is already overdefined. But that is not necessary for
CastInsts for example. By skipping values that depend on skipped
calls, we resolve the crashes and also improve the precision in some
cases (see resolvedundefsin-tracked-fn.ll).
Note that we may not skip PHI nodes that may depend on a skipped call,
but they can be safely marked as overdefined, as we bail out early if
the PHI node is overdefined.
This reverts the revert commit
a74b31a3e9cd844c7ce2087978568e3f5ec8519.
Summary:
Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it
is incorrect to retrieve these passes as cached.
This patch makes the loop passes in question compute a new instance.
In some of these cases, however, it may be beneficial to change the Loop pass to
a Function pass instead, similar to the change for LoopUnrollAndJam.
Reviewers: chandlerc, dmgreen, jdoerfert, reames
Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72891
This reverts commit 636c93ed11.
The original patch caused build failures on TSan buildbots. Commit 6ded69f294
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
This patch is a fix following the revert of 72ce759
(https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba)
and fixes the failure that it caused.
The above patch failed on the Thread Sanitizer buildbot with an out of
memory error. After an investigation, the cause was identified as an
explosion in debug intrinsics while running the Jump Threading pass on
ModuleMap.ll. The above patched prevented debug intrinsics from being
dropped when their Basic Block was deleted due to being "empty". In this
case, one of the functions in ModuleMap.ll had (after many optimization
passes) a very large number of debug intrinsics representing a set of
repeatedly inlined variables. Previously the vast majority of these were
silently dropped during Jump Threading when their blocks were deleted,
but as of the above patch they survived for longer, causing a large
increase in the number of debug intrinsics. These intrinsics were then
repeatedly cloned by the Jump Threading pass as edges were threaded,
multiplying the intrinsic count further. The memory consumed by this
process spiralled out of control, crashing the buildbot that uses TSan
(which has an estimated 5-10x memory overhead compared to non-sanitized
builds).
This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in
order to reduce the number of debug intrinsics down to a manageable
amount in cases where many intrinsics for the same variable end up
bunched together contiguously, as in this case.
Differential Revision: https://reviews.llvm.org/D73054
This causes a crash for the reproducer below
enum { a };
enum b { c, d };
e;
static _Bool g(struct f *h, enum b i) {
i &&j();
return a;
}
static k(char h, enum b i) {
_Bool l = g(e, i);
l;
}
m(h) {
k(h, c);
g(h, d);
}
This reverts commit aadb635e04.
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.
Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
As an approximation to a dead edge we can check if the terminator is
dead. If so, the corresponding operand use in a PHI node is dead even if
the PHI node itself is not.
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.