LoopInfo can be easily preserved by passing it to the functions that
modify the CFG (SplitCriticalEdge and MergeBlockIntoPredecessor.
SplitCriticalEdge also preserves LoopSimplify and LCSSA form when when passing in
LoopInfo. The test case shows that we preserve LoopSimplify and
LoopInfo. Adding addPreservedID(LCSSAID) did not preserve LCSSA for some
reason.
Also I am not sure if it is possible to preserve those in the new pass
manager, as they aren't analysis passes.
Reviewers: reames, hfinkel, davide, jdoerfert
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D65137
llvm-svn: 367332
Summary:
This patch extends the use of the OptimizationRemarkEmitter to provide
information about loops that are not fused, and loops that are not eligible for
fusion. In particular, it uses the OptimizationRemarkAnalysis to identify loops
that are not eligible for fusion and the OptimizationRemarkMissed to identify
loops that cannot be fused.
It also reuses the statistics to provide the messages used in the
OptimizationRemarks. This provides common message strings between the
optimization remarks and the statistics.
I would like feedback on this approach, in general. If people are OK with this,
I will flesh out additional remarks in subsequent commits.
Subscribers: hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63844
llvm-svn: 367327
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.
While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.
There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS
To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).
Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65046
llvm-svn: 367322
test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG broke:
Only PHI nodes may reference their own value!
%sub33 = srem i32 %sub33, %ranks_in_i
This reverts commit r367288.
llvm-svn: 367289
Summary:
While `-div-rem-pairs` pass can decompose rem in div+rem pair when div-rem pair
is unsupported by target, nothing performs the opposite fold.
We can't do that in InstCombine or DAGCombine since neither of those has access to TTI.
So it makes most sense to teach `-div-rem-pairs` about it.
If we matched rem in expanded form, we know we will be able to place div-rem pair
next to each other so we won't regress the situation.
Also, we shouldn't decompose rem if we matched already-decomposed form.
This is surprisingly straight-forward otherwise.
https://bugs.llvm.org/show_bug.cgi?id=42673
Reviewers: spatel, RKSimon, efriedma, ZaMaZaN4iK, bogner
Reviewed By: bogner
Subscribers: bogner, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65298
llvm-svn: 367288
Globals that are associated with globals with type metadata need to appear
in the merged module because they will reference the global's section directly.
Differential Revision: https://reviews.llvm.org/D65312
llvm-svn: 367242
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367227
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367194
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.
This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.
Differential Revision: https://reviews.llvm.org/D65336
llvm-svn: 367173
unreachable loop.
updatePredecessorProfileMetadata in jumpthreading tries to find the
first dominating predecessor block for a PHI value by searching upwards
the predecessor block chain.
But jumpthreading may see some temporary IR state which contains
unreachable bb not being cleaned up. If an unreachable loop happens to
be on the predecessor block chain, keeping chasing the predecessor
block will run into an infinite loop.
The patch fixes it.
Differential Revision: https://reviews.llvm.org/D65310
llvm-svn: 367154
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)
This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716
Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.
Differential Revision: https://reviews.llvm.org/D65305
llvm-svn: 367101
To avoid duplicates in loop metadata, if the string to add is
already there, just update the value.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D65265
llvm-svn: 367087
Just move the utility function to LoopUtils.cpp to re-use it in loop peeling.
Reviewers: reames, Ashutosh
Reviewed By: reames
Subscribers: hiraditya, asbirlea, llvm-commits
Differential Revision: https://reviews.llvm.org/D65264
llvm-svn: 367085
changes were made to the patch since then.
--------
[NewPM] Port Sancov
This patch contains a port of SanitizerCoverage to the new pass manager. This one's a bit hefty.
Changes:
- Split SanitizerCoverageModule into 2 SanitizerCoverage for passing over
functions and ModuleSanitizerCoverage for passing over modules.
- ModuleSanitizerCoverage exists for adding 2 module level calls to initialization
functions but only if there's a function that was instrumented by sancov.
- Added legacy and new PM wrapper classes that own instances of the 2 new classes.
- Update llvm tests and add clang tests.
llvm-svn: 367053
Currently there are a few pointer comparisons in ValueDFS_Compare, which
can cause non-deterministic ordering when materializing values. There
are 2 cases this patch fixes:
1. Order defs before uses used to compare pointers, which guarantees
defs before uses, but causes non-deterministic ordering between 2
uses or 2 defs, depending on the allocation order. By converting the
pointers to booleans, we can circumvent that problem.
2. comparePHIRelated was comparing the basic block pointers of edges,
which also results in a non-deterministic order and is also not
really meaningful for ordering. By ordering by their destination DFS
numbers we guarantee a deterministic order.
For the example below, we can end up with 2 different uselist orderings,
when running `opt -mem2reg -ipsccp` hundreds of times. Because the
non-determinism is caused by allocation ordering, we cannot reproduce it
with ipsccp alone.
declare i32 @hoge() local_unnamed_addr #0
define dso_local i32 @ham(i8* %arg, i8* %arg1) #0 {
bb:
%tmp = alloca i32
%tmp2 = alloca i32, align 4
br label %bb19
bb4: ; preds = %bb20
br label %bb6
bb6: ; preds = %bb4
%tmp7 = call i32 @hoge()
store i32 %tmp7, i32* %tmp
%tmp8 = load i32, i32* %tmp
%tmp9 = icmp eq i32 %tmp8, 912730082
%tmp10 = load i32, i32* %tmp
br i1 %tmp9, label %bb11, label %bb16
bb11: ; preds = %bb6
unreachable
bb13: ; preds = %bb20
br label %bb14
bb14: ; preds = %bb13
%tmp15 = load i32, i32* %tmp
br label %bb16
bb16: ; preds = %bb14, %bb6
%tmp17 = phi i32 [ %tmp10, %bb6 ], [ 0, %bb14 ]
br label %bb19
bb18: ; preds = %bb20
unreachable
bb19: ; preds = %bb16, %bb
br label %bb20
bb20: ; preds = %bb19
indirectbr i8* null, [label %bb4, label %bb13, label %bb18]
}
Reviewers: davide, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D64866
llvm-svn: 367049
We'd like to determine the idom of exit block after peeling one iteration.
Let Exit is exit block.
Let ExitingSet - is a set of predecessors of Exit block. They are exiting blocks.
Let Latch' and ExitingSet' are copies after a peeling.
We'd like to find an idom'(Exit) - idom of Exit after peeling.
It is an evident that idom'(Exit) will be the nearest common dominator of ExitingSet and ExitingSet'.
idom(Exit) is a nearest common dominator of ExitingSet.
idom(Exit)' is a nearest common dominator of ExitingSet'.
Taking into account that we have a single Latch, Latch' will dominate Header and idom(Exit).
So the idom'(Exit) is nearest common dominator of idom(Exit)' and Latch'.
All these basic blocks are in the same loop, so what we find is
(nearest common dominator of idom(Exit) and Latch)'.
Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D65292
llvm-svn: 367044
Later code in TryToSimplifyUncondBranchFromEmptyBlock() assumes that
we have cleaned up unreachable blocks, but that was not happening
with this switch transform.
llvm-svn: 367037
This reverts commit bc4a63fd3c, this is a
speculative revert to fix a number of sanitizer bots (like
sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2
compiler crashes, presumably due to a miscompile.
llvm-svn: 367029
We do not need the SmallPtrSet to avoid adding duplicates to
OpsToRename, because we already keep a ValueInfo mapping. If we see an
op for the first time, Infos will be empty and we can also add it to
OpsToRename.
We process operands by visiting BBs depth-first and then iterate over
all instructions & users, so the order should be deterministic.
Therefore we can skip one round of sorting, which we purely needed for
guaranteeing a deterministic order when iterating over the SmallPtrSet.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D64816
llvm-svn: 367028
trunc (load X) --> load (bitcast X to narrow type)
We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.
Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.
We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.
Differential Revision: https://reviews.llvm.org/D64432
llvm-svn: 367011
We should only zap returns in functions, where all live users have a
replace-able value (are not overdefined). Unused return values should be
undefined.
This should make it easier to detect bugs like in PR42738.
Alternatively we could bail out of zapping the function returns, but I
think it would be better to address those divergences between function
and call-site values where they are actually caused.
Reviewers: davide, efriedma
Reviewed By: davide, efriedma
Differential Revision: https://reviews.llvm.org/D65222
llvm-svn: 366998
This refactors boolean 'OptForSize' that was passed around in a lot of places.
It controlled folding of the tail loop, the scalar epilogue, into the main loop
but code-size reasons may not be the only reason to do this. Thus, this is a
first step to generalise the concept of tail-loop folding, and hence OptForSize
has been renamed and is using an enum ScalarEpilogueStatus that holds the
status how the epilogue should be lowered.
This will be followed up by D65197, that picks up the predicate loop hint and
performs the tail-loop folding.
Differential Revision: https://reviews.llvm.org/D64916
llvm-svn: 366993
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.
We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.
Fixes PR42691.
Differential Revision: https://reviews.llvm.org/D65017
llvm-svn: 366945
This is a follow up to D64971. While we need to insert the deref after
the offset, it needs to come before the remaining elements in the
original expression since the deref needs to happen before the LLVM
fragment if present.
Differential Revision: https://reviews.llvm.org/D65172
llvm-svn: 366865
The original code failed to account for the fact that one exit can have a pointer exit count without all of them having pointer exit counts. This could cause two separate bugs:
1) We might exit the loop early, and leave optimizations undone. This is what triggered the assertion failure in the reported test case.
2) We might optimize one exit, then exit without indicating a change. This could result in an analysis invalidaton bug if no other transform is done by the rest of indvars.
Note that the pointer exit counts are a really fragile concept. They show up only when we have a pointer IV w/o a datalayout to provide their size. It's really questionable to me whether the complexity implied is worth it.
llvm-svn: 366829
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.
The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.
We do see a couple of regressions that need to be addressed:
AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.
The code owners have confirmed its ok for these cases to fixed up in future patches.
Differential Revision: https://reviews.llvm.org/D63281
llvm-svn: 366799
cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers
llvm-svn: 366793
Summary:
Deduce dereferenceable attribute in Attributor.
These will be added in a later patch.
* dereferenceable(_or_null)_globally (D61652)
* Deduction based on load instruction (similar to D64258)
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64876
llvm-svn: 366788
Summary: Adds a binding to the internalize pass that allows the caller to pass a function pointer that acts as the visibility-preservation predicate. Previously, one could only pass an unsigned value (not LLVMBool?) that directed the pass to consider "main" or not.
Reviewers: whitequark, deadalnix, harlanhaskins
Reviewed By: whitequark, harlanhaskins
Subscribers: kren1, hiraditya, llvm-commits, harlanhaskins
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62456
llvm-svn: 366777
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64162
llvm-svn: 366769
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366753
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366736
Porting function return value attribute noalias to attributor.
This will be followed with a patch for callsite and function argumets.
Reviewers: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D63067
llvm-svn: 366728
While debugging code that uses SafeStack, we've noticed that LLVM
produces an invalid DWARF. Concretely, in the following example:
int main(int argc, char* argv[]) {
std::string value = "";
printf("%s\n", value.c_str());
return 0;
}
DWARF would describe the value variable as being located at:
DW_OP_breg14 R14+0, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus
The assembly to get this variable is:
leaq -32(%r14), %rbx
The order of operations in the DWARF symbols is incorrect in this case.
Specifically, the deref is incorrect; this appears to be incorrectly
re-inserted in repalceOneDbgValueForAlloca.
With this change which inserts the deref after the offset instead of
before it, LLVM produces correct DWARF:
DW_OP_breg14 R14-32
Differential Revision: https://reviews.llvm.org/D64971
llvm-svn: 366726
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.
This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.
There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.
Differential Revision: https://reviews.llvm.org/D65031
llvm-svn: 366725
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.
Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.
Differential Revision: https://reviews.llvm.org/D65033
llvm-svn: 366722
Current algorithm to update branch weights of latch block and its copies is
based on the assumption that number of peeling iterations is approximately equal
to trip count.
However it is not correct. According to profitability check in one case we can decide to peel
in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration
can be less then estimated trip count.
This patch introduces another way to set the branch weights to peeled of branches.
Let F is a weight of the edge from latch to header.
Let E is a weight of the edge from latch to exit.
F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit.
Then, Estimated TripCount = F / E.
For I-th (counting from 0) peeled off iteration we set the the weights for
the peeled latch as (TC - I, 1). It gives us reasonable distribution,
The probability to go to exit 1/(TC-I) increases. At the same time
the estimated trip count of remaining loop reduces by I.
As a result after peeling off N iteration the weights will be
(F - N * E, E) and trip count of loop becomes
F / E - N or TC - N.
The idea is taken from the review of the patch D63918 proposed by Philip.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64235
llvm-svn: 366665
For equality, the function called getTrue/getFalse with the VT
of the comparison input. But getTrue/getFalse need the boolean VT.
So if this code ever executed, it would assert.
I believe these cases are removed by InstSimplify so we don't get here.
So this patch just fixes up an assert to exclude the equality
possibility and removes the broken code.
llvm-svn: 366649
AddOne/SubOne create new Constant objects. That seems heavy for
comparing ConstantInts which wrap APInts. Just do the math on
on the APInts and compare them.
llvm-svn: 366648
If the blockaddress is not destoryed, the destination block will still
be marked as having its address taken, limiting further transformations.
I think there are other places where the dead blockaddress constants are kept
around, I'll look into that as follow up.
Reviewers: craig.topper, brzycki, davide
Reviewed By: brzycki, davide
Differential Revision: https://reviews.llvm.org/D64936
llvm-svn: 366633
Summary:
This patch removes the `default` case from some switches on
`llvm::Triple::ObjectFormatType`, and cases for the missing enumerators
(`UnknownObjectFormat`, `Wasm`, and `XCOFF`) are then added.
For `UnknownObjectFormat`, the effect of the action for the `default`
case is maintained; otherwise, where `llvm_unreachable` is called,
`report_fatal_error` is used instead.
Where the `default` case returns a default value, `report_fatal_error`
is used for XCOFF as a placeholder. For `Wasm`, the effect of the action
for the `default` case in maintained.
The code is structured to avoid strongly implying that the `Wasm` case
is present for any reason other than to make the switch cover all
`ObjectFormatType` enumerator values.
Reviewers: sfertile, jasonliu, daltenty
Reviewed By: sfertile
Subscribers: hiraditya, aheejin, sunfish, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64222
llvm-svn: 366544
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:
alive proofs:
f: https://rise4fun.com/Alive/7U3
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64524
llvm-svn: 366540
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
e: https://rise4fun.com/Alive/0FT
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64521
llvm-svn: 366539
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
d: https://rise4fun.com/Alive/I5Y
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64519
llvm-svn: 366538
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
c: https://rise4fun.com/Alive/RgJh
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64517
llvm-svn: 366537
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
b: https://rise4fun.com/Alive/y8M
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64514
llvm-svn: 366536
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
a: https://rise4fun.com/Alive/wi9
Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, huihuiz, xbolva00
Reviewed By: xbolva00
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64512
llvm-svn: 366535
This will let us instrument globals during initialization. This required
making the new PM pass a module pass, which should still provide access to
analyses via the ModuleAnalysisManager.
Differential Revision: https://reviews.llvm.org/D64843
llvm-svn: 366379
Summary:
Deduce the "willreturn" attribute for functions.
For now, intrinsics are not willreturn. More annotation will be done in another patch.
Reviewers: jdoerfert
Subscribers: jvesely, nhaehnle, nicholas, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63046
llvm-svn: 366335
I don't have an IR sample which is actually failing, but the issue described in the comment is theoretically possible, and should be guarded against even if there's a different root cause for the bot failures.
llvm-svn: 366241
As there are some reported miscompiles with AVX512 and performance regressions
in Eigen. Verified with the original committer and testcases will be forthcoming.
This reverts commit r364964.
llvm-svn: 366154
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.
It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.
The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.
Reviewers: pcc, hctim, vitalybuka, ostannard
Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64169
llvm-svn: 366123
There are scenarios where mutually recursive functions may cause the SCC
to contain both read only and write only functions. This removes an
assertion when adding read attributes which caused a crash with a the
provided test case, and instead just doesn't add the attributes.
Patch by Luke Lau <luke.lau@intel.com>
Differential Revision: https://reviews.llvm.org/D60761
llvm-svn: 366090
It is possible that loop exit has two predecessors in a loop body.
In this case after the peeling the iDom of the exit should be a clone of
iDom of original exit but no a clone of a block coming to this exit.
Reviewers: reames, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64618
llvm-svn: 366050
We do not compute the scalarization overhead in getVectorIntrinsicCost
and TTI::getIntrinsicInstrCost requires the full arguments list.
llvm-svn: 366049
This CL enables peeling of the loop with multiple exits where
one exit should be from latch and others are basic blocks with
call to deopt.
The peeling is enabled under the flag which is false by default.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D63923
llvm-svn: 366048
With this patch the getLoopEstimatedTripCount function will
accept also the loops where there are more than one exit but
all exits except latch block should ends up with a call to deopt.
This side exits should not impact the estimated trip count.
Reviewers: reames, mkuper, danielcdh
Reviewed By: reames
Subscribers: fhahn, lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64553
llvm-svn: 366042
Extract the code from LoopUnrollRuntime into utility function to
re-use it in D63923.
Reviewers: reames, mkuper
Reviewed By: reames
Subscribers: fhahn, hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D64548
llvm-svn: 366040
Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.
Fixes PR41294
Reviewers: hsaito, rengolin, dcaballe, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D59995
llvm-svn: 366030
Attributor::getAAFor will now only return AbstractAttributes with a
valid AbstractState. This simplifies call sites as they only need to
check if the returned pointer is non-null. It also reduces the potential
for accidental misuse.
llvm-svn: 365983
Some function in the Attributor framework are unnecessarily
marked virtual. This patch removes virtual keyword
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64637
llvm-svn: 365925
Continue in the spirit of D63618, and use exit count reasoning to prove away loop exits which can not be taken since the backedge taken count of the loop as a whole is provably less than the minimal BE count required to take this particular loop exit.
As demonstrated in the newly added tests, this triggers in a number of cases where IndVars was previously unable to discharge obviously redundant exit tests. And some not so obvious ones.
Differential Revision: https://reviews.llvm.org/D63733
llvm-svn: 365920
This patch contains a port of SanitizerCoverage to the new pass manager. This one's a bit hefty.
Changes:
- Split SanitizerCoverageModule into 2 SanitizerCoverage for passing over
functions and ModuleSanitizerCoverage for passing over modules.
- ModuleSanitizerCoverage exists for adding 2 module level calls to initialization
functions but only if there's a function that was instrumented by sancov.
- Added legacy and new PM wrapper classes that own instances of the 2 new classes.
- Update llvm tests and add clang tests.
Differential Revision: https://reviews.llvm.org/D62888
llvm-svn: 365838
Introduce and deduce "nosync" function attribute to indicate that a function
does not synchronize with another thread in a way that other thread might free memory.
Reviewers: jdoerfert, jfb, nhaehnle, arsenm
Subscribers: wdng, hfinkel, nhaenhle, mehdi_amini, steven_wu,
dexonsmith, arsenm, uenoku, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D62766
llvm-svn: 365830
Summary:
We will need to handle IntToPtr which I will submit in a separate patch as it's
not going to be NFC.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D63940
llvm-svn: 365709
Summary:
The map kept in loop rotate is used for instruction remapping, in order
to simplify the clones of instructions. Thus, if an instruction can be
simplified, its simplified value is placed in the map, even when the
clone is added to the IR. MemorySSA in contrast needs to know about that
clone, so it can add an access for it.
To resolve this: keep a different map for MemorySSA.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63680
llvm-svn: 365672
An alloca which can be sunk into the extraction region may have more
than one bitcast use. Move these uses along with the alloca to prevent
use-before-def.
Testing: check-llvm, stage2 build of clang
Fixes llvm.org/PR42451.
Differential Revision: https://reviews.llvm.org/D64463
llvm-svn: 365660
Summary:
Transform
pow(C,x)
To
exp2(log2(C)*x)
if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already).
log(C) is folded by the compiler and exp2 is much faster to compute than pow.
Reviewers: spatel, efriedma, evandro
Reviewed By: evandro
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64099
llvm-svn: 365637
Only instructions with two or more unique successors should be considered for unswitching.
Patch Author: Daniil Suchkov.
Reviewers: reames, asbirlea, skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64404
llvm-svn: 365611
For a given set of live values, the spill cost will always be the
same for each call. Compute the cost once and multiply it by the
number of calls.
(I'm not sure this spill cost modeling makes sense if there are
multiple calls, as the spill cost will likely be shared across
calls in that case. But that's how it currently works.)
llvm-svn: 365552
A short granule is a granule of size between 1 and `TG-1` bytes. The size
of a short granule is stored at the location in shadow memory where the
granule's tag is normally stored, while the granule's actual tag is stored
in the last byte of the granule. This means that in order to verify that a
pointer tag matches a memory tag, HWASAN must check for two possibilities:
* the pointer tag is equal to the memory tag in shadow memory, or
* the shadow memory tag is actually a short granule size, the value being loaded
is in bounds of the granule and the pointer tag is equal to the last byte of
the granule.
Pointer tags between 1 to `TG-1` are possible and are as likely as any other
tag. This means that these tags in memory have two interpretations: the full
tag interpretation (where the pointer tag is between 1 and `TG-1` and the
last byte of the granule is ordinary data) and the short tag interpretation
(where the pointer tag is stored in the granule).
When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
will show both the memory tag and the last byte of the granule. Currently,
it is up to the user to disambiguate the two possibilities.
Because this functionality obsoletes the right aligned heap feature of
the HWASAN memory allocator (and because we can no longer easily test
it), the feature is removed.
Also update the documentation to cover both short granule tags and
outlined checks.
Differential Revision: https://reviews.llvm.org/D63908
llvm-svn: 365551
Note: I don't actually plan to implement all of the cases at the moment, I'm just documenting them for completeness. There's a couple of cases left which are practically useful for me in debugging loop transforms, and I'll probably stop there for the moment.
llvm-svn: 365550
These are sources of poison which don't come from flags, but are clearly documented in the LangRef. Left off support for scalable vectors for the moment, but should be easy to add if anyone is interested.
llvm-svn: 365543
Implements a transform pass which instruments IR such that poison semantics are made explicit. That is, it provides a (possibly partial) executable semantics for every instruction w.r.t. poison as specified in the LLVM LangRef. There are obvious parallels to the sanitizer tools, but this pass is focused purely on the semantics of LLVM IR, not any particular source language.
The target audience for this tool is developers working on or targetting LLVM from a frontend. The idea is to be able to take arbitrary IR (with the assumption of known inputs), and evaluate it concretely after having made poison semantics explicit to detect cases where either a) the original code executes UB, or b) a transform pass introduces UB which didn't exist in the original program.
At the moment, this is mostly the framework and still needs to be fleshed out. By reusing existing code we have decent coverage, but there's a lot of cases not yet handled. What's here is good enough to handle interesting cases though; for instance, one of the recent LFTR bugs involved UB being triggered by integer induction variables with nsw/nuw flags would be reported by the current code.
(See comment in PoisonChecking.cpp for full explanation and context)
Differential Revision: https://reviews.llvm.org/D64215
llvm-svn: 365536
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
This patch modifies the loop peeling transformation so that
it does not expect that there is only one loop exit from latch.
It modifies only transformation. Update of branch weights remains
only for exit from latch.
The motivation is that in follow-up patch I plan to enable loop peeling for
loops with multiple exits but only if other exits then from latch one goes to
block with call to deopt.
For now this patch is NFC.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames, fhahn
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D63921
llvm-svn: 365441
D63921 requires getExitEdges fills a vector of Edge pairs where
BasicBlocks are not constant.
The rest Loop API mostly returns non-const BasicBlocks, so to be more consistent with
other Loop API getExitEdges is modified to return non-const BasicBlocks as well.
This is an alternative solution to D64060.
Reviewers: reames, fhahn
Reviewed By: reames, fhahn
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64309
llvm-svn: 365437
A while back, I added support for NE latches formed by LFTR. I didn't think that quite through, as LFTR will also produce the inverse EQ form for some loops and I hadn't handled that. This change just adds handling for that case as well.
llvm-svn: 365419
Deduce the "returned" argument attribute by collecting all potentially
returned values.
Not only the unique return value, if any, can be used by subsequent
attributes but also the set of all potentially returned values as well
as the mapping from returned values to return instructions that they
originate from (see AAReturnedValues::checkForallReturnedValues).
Change in statistics (-stats) for LLVM-TS + Spec2006, totaling ~19% more "returned" arguments.
ADDED: attributor NumAttributesManifested n/a -> 637
ADDED: attributor NumAttributesValidFixpoint n/a -> 25545
ADDED: attributor NumFnArgumentReturned n/a -> 637
ADDED: attributor NumFnKnownReturns n/a -> 25545
ADDED: attributor NumFnUniqueReturned n/a -> 14118
CHANGED: deadargelim NumRetValsEliminated 470 -> 449 ( -4.468%)
REMOVED: functionattrs NumReturned 535 -> n/a
CHANGED: indvars NumElimIdentity 138 -> 164 ( +18.841%)
Reviewers: homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames, efriedma, chandlerc
Subscribers: hiraditya, bollu, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D59919
llvm-svn: 365407
Forming the canonical splat shuffle improves analysis and
may allow follow-on transforms (although some possibilities
are missing as shown in the test diffs).
The backend generically turns these patterns into build_vector,
so there should be no codegen regressions. All targets are
expected to be able to lower splats efficiently.
llvm-svn: 365379
loop
Summary:
Do the cloning in two steps, first allocate all the new loops, then
clone the basic blocks in the same order as the original loop.
Reviewer: Meinersbur, fhahn, kbarton, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, hiraditya, llvm-commits
Tag: https://reviews.llvm.org/D64224
Differential Revision:
llvm-svn: 365366
We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue()
and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts
to that form for better matching.
The backend generically turns these patterns into build_vector,
so there should be no codegen difference.
llvm-svn: 365342
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D49165
llvm-svn: 365336
We had versions of this code scattered around, so consolidate into one location.
Not strictly NFC since the order of intermediate results may change in some places, but since these operations are associatives, should not change results.
llvm-svn: 365259
It's possible that some function can load and store the same
variable using the same constant expression:
store %Derived* @foo, %Derived** bitcast (%Base** @bar to %Derived**)
%42 = load %Derived*, %Derived** bitcast (%Base** @bar to %Derived**)
The bitcast expression was mistakenly cached while processing loads,
and never examined later when processing store. This caused @bar to
be mistakenly treated as read-only variable. See load-store-caching.ll.
llvm-svn: 365188
We allow forming a splat (broadcast) shuffle, but we were conservatively limiting
that to cases where all elements of the vector are specified. It should be safe
from a codegen perspective to allow undefined lanes of the vector because the
expansion of a splat shuffle would become the chain of inserts again.
Forming splat shuffles can reduce IR and help enable further IR transforms.
Motivating bugs:
https://bugs.llvm.org/show_bug.cgi?id=42174https://bugs.llvm.org/show_bug.cgi?id=16739
Differential Revision: https://reviews.llvm.org/D63848
llvm-svn: 365147
This reverts r365040 (git commit 5cacb91475)
Speculatively reverting, since this appears to have broken check-lld on
Linux. Partial analysis in https://crbug.com/981168.
llvm-svn: 365097
If the block being cloned contains a PHI node, in general, we need to
clone that PHI node, even though it's trivial. If the operand of the PHI
is an instruction in the block being cloned, the correct value for the
operand doesn't exist until SSAUpdater constructs it.
We usually don't hit this issue because we try to avoid threading across
loop headers, but it's possible to hit this in some cases involving
irreducible CFGs. I added a flag to allow threading across loop headers
to make the testcase easier to understand.
Thanks to Brian Rzycki for reducing the testcase.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42085.
Differential Revision: https://reviews.llvm.org/D63913
llvm-svn: 365094
As noted in the test change, this is not trivially NFC, but all of the changes in output are cases where the SCEVExpander form is more canonical/optimal than the hand generation.
llvm-svn: 365075
The motivation for this is two fold:
1) Make the output (and thus tests) a bit more readable to a human trying to understand the result of the transform
2) Reduce spurious diffs in a potential future change to restructure all of this logic to use SCEVExpander (which hoists by default)
llvm-svn: 365066
Summary:
Handling callbr is very similar to handling an inline assembly call:
MSan must checks the instruction's inputs.
callbr doesn't (yet) have outputs, so there's nothing to unpoison,
and conservative assembly handling doesn't apply either.
Fixes PR42479.
Reviewers: eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64072
llvm-svn: 365008
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: hiraditya, phosek, rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
llvm-svn: 364964
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).
Also added are the necessary bitcode records, and the corresponding
assembly support.
The follow-on index-based WPD patch is D55153.
Depends on D53890.
Reviewers: pcc
Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54815
llvm-svn: 364960
I was actually wondering if there was some nicer way than m_Value()+cast,
but apparently what i was really "subconsciously" thinking about
was correctness issue.
hasNoUnsignedWrap()/hasNoUnsignedWrap() exist for Instruction,
not for BinaryOperator, so let's just use m_Instruction(),
thus both avoiding a cast, and a crash.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42484,
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15587
llvm-svn: 364915
Extends the transform from:
rL364341
...to include another (more common?) pattern that tests whether a
value is a power-of-2 (including or excluding zero).
llvm-svn: 364856
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
This does address the regression being exposed in D63992.
https://godbolt.org/z/7DGltUhttps://rise4fun.com/Alive/EPO0
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]]
Reviewers: spatel, nikic, huihuiz
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63993
llvm-svn: 364792
Summary:
Given pattern:
`icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0`
we should move shifts to the same hand of 'and', i.e. rewrite as
`icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)`
It might be tempting to not restrict this to situations where we know
we'd fold two shifts together, but i'm not sure what rules should there be
to avoid endless combine loops.
We pick the same shift that was originally used to shift the variable we picked to shift:
https://rise4fun.com/Alive/6x1v
Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]].
Reviewers: spatel, nikic, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63829
llvm-svn: 364791
This is the opposite direction of D62158 (we have to choose 1 form or the other).
Now that we have FMF on the select, this becomes more palatable. And the benefits
of having a single IR instruction for this operation (less chances of missing folds
based on extra uses, etc) overcome my previous comments about the potential advantage
of larger pattern matching/analysis.
Differential Revision: https://reviews.llvm.org/D62414
llvm-svn: 364721
What we want to know here is whether we're already using this value
for the loop condition, so make the query about that. We can extend
this to a more general "based-on" relationship, rather than a direct
icmp use later.
llvm-svn: 364715
This transform came up in D62414, but we should deal with it first.
We have LLVM intrinsics that correspond exactly to libm calls (unlike
most libm calls, these libm calls never set errno).
This holds without any fast-math-flags, so we should always canonicalize
to those intrinsics directly for better optimization.
Currently, we convert to fcmp+select only when we have FMF (nnan) because
fcmp+select does not preserve the semantics of the call in the general case.
Differential Revision: https://reviews.llvm.org/D63214
llvm-svn: 364714
The whole indvars pass works on loops in simplified form, so there
is always a unique latch. Convert the condition into an assertion
in needsLFTR (though we also assert this in later LFTR functions).
Additionally update the comment on getLoopTest() now that we are
dealing with multiple exits.
llvm-svn: 364713
Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we
have a truncated exit count we'll truncate the IV when comparing
against the limit, in which case exit count overflow in post-inc
form doesn't matter. However, for pointer IVs we don't do that, so
we have to be careful about incrementing the IV in the wide type.
I'm fixing this by removing the IVCount variable (which was
ExitCount or ExitCount+1) and replacing it with a UsePostInc flag,
and then moving the actual limit adjustment to the individual cases
(which are: pointer IV where we add to the wide type, integer IV
where we add to the narrow type, and constant integer IV where we
add to the wide type).
Differential Revision: https://reviews.llvm.org/D63686
llvm-svn: 364709
This shaves an instruction (and a GOT entry in PIC code) off prologues of
functions with stack variables.
Differential Revision: https://reviews.llvm.org/D63472
llvm-svn: 364608
This patch introduces a new function attribute, willreturn, to indicate
that a call of this function will either exhibit undefined behavior or
comes back and continues execution at a point in the existing call stack
that includes the current invocation.
This attribute guarantees that the function does not have any endless
loops, endless recursion, or terminating functions like abort or exit.
Patch by Hideto Ueno (@uenoku)
Reviewers: jdoerfert
Subscribers: mehdi_amini, hiraditya, steven_wu, dexonsmith, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62801
llvm-svn: 364555
FunctionComparator attempts to produce a stable comparison of two Function
instances by looking at all available properties. Since ByVal attributes now
contain a Type pointer, they are not trivially ordered and FunctionComparator
should use its own Type comparison logic to sort them.
llvm-svn: 364523
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
llvm-svn: 364478
Summary:
This diff enables address sanitizer on Emscripten.
On Emscripten, real memory starts at the value passed to --global-base.
All memory before this is used as shadow memory, and thus the shadow mapping
function is simply dividing by 8.
Reviewers: tlively, aheejin, sbc100
Reviewed By: sbc100
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D63742
llvm-svn: 364468
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
llvm-svn: 364412
This patch generalizes the UnrollLoop utility to support loops that exit
from the header instead of the latch. Usually, LoopRotate would take care
of must of those cases, but in some cases (e.g. -Oz), LoopRotate does
not kick in.
Codesize impact looks relatively neutral on ARM64 with -Oz + LTO.
Program master patch diff
External/S.../CFP2006/447.dealII/447.dealII 629060.00 627676.00 -0.2%
External/SPEC/CINT2000/176.gcc/176.gcc 1245916.00 1244932.00 -0.1%
MultiSourc...Prolangs-C/simulator/simulator 86100.00 86156.00 0.1%
MultiSourc...arks/Rodinia/backprop/backprop 66212.00 66252.00 0.1%
MultiSourc...chmarks/Prolangs-C++/life/life 67276.00 67312.00 0.1%
MultiSourc...s/Prolangs-C/compiler/compiler 69824.00 69788.00 -0.1%
MultiSourc...Prolangs-C/assembler/assembler 86672.00 86696.00 0.0%
Reviewers: efriedma, vsk, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D61962
llvm-svn: 364398
This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero.
This is matching the isPowerOf2() patterns used in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
But there's at least 1 instcombine follow-up needed to match the alternate form:
(v & (v - 1)) == 0;
We should have all of the backend expansions handled with:
rL364319
(x86-specific changes still needed for optimal code based on subtarget)
And the larger patterns to exclude zero as a power-of-2 are joining with this change after:
rL364153 ( D63660 )
rL364246
Differential Revision: https://reviews.llvm.org/D63777
llvm-svn: 364341
This is the Demorgan'd 'not' of the pattern handled in:
D63660 / rL364153
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is not a power-of-2 using ctpop(X) > 1,
so combining that with an is-zero check of the input is the
same as testing if not exactly 1 bit is set:
(X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1
llvm-svn: 364246
Inference of nowrap flags in CVP has been disabled, because it
triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181).
This issue has been fixed in D60935, so we should be able to reenable
nowrap flag inference now.
Differential Revision: https://reviews.llvm.org/D62776
llvm-svn: 364228
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.
llvm-svn: 364187
In rL364135, I taught IndVars to fold exiting branches in loops with a zero backedge taken count (i.e. loops that only run one iteration). This extends that to eliminate the dead comparison left around.
llvm-svn: 364155
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:
(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1
Differential Revision: https://reviews.llvm.org/D63660
llvm-svn: 364153
This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.
Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.
Differential Revision: https://reviews.llvm.org/D63618
llvm-svn: 364135
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.
This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.
rdar://50645192
rdar://51200372
rdar://51767702
Reviewed By: delcypher
Differential Revision: https://reviews.llvm.org/D63586
llvm-svn: 364105
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D60897
llvm-svn: 364084
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63593
llvm-svn: 364051
Also, add a FIXME for the unsafe transform on a unary FNeg. A unary FNeg can only be transformed to a FMul by -1.0 when the nnan flag is present. The unary FNeg project is a WIP, so the unsafe transformation is acceptable until that work is complete.
The bogus assert with introduced in D63445.
llvm-svn: 363998
Currently, many profiling tests on Solaris FAIL like
Command Output (stderr):
--
Undefined first referenced
symbol in file
__llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
__llvm_profile_register_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
Solaris 11.4 ld supports the non-standard GNU ld extension of adding
__start_SECNAME and __stop_SECNAME labels to sections whose names are valid
as C identifiers. Given that we already use Solaris 11.4-only features
like ld -z gnu-version-script-compat and fully working .preinit_array
support in compiler-rt, we don't need to worry about older versions of
Solaris ld.
The patch documents that support (although the comment in
lib/Transforms/Instrumentation/InstrProfiling.cpp
(needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's
actually about), and adapts the affected testcase not to expect the
alternativeq __llvm_profile_register_functions and __llvm_profile_init.
It fixes all affected tests.
Tested on amd64-pc-solaris2.11.
Differential Revision: https://reviews.llvm.org/D41111
llvm-svn: 363984
Summary:
The getClobberingMemoryAccess API checks for clobbering accesses in a loop by walking the backedge. This may check if a memory access is being
clobbered by the loop in a previous iteration, depending how smart AA got over the course of the updates in MemorySSA (it does not occur when built from scratch).
If no clobbering access is found inside the loop, it will optimize to an access outside the loop. This however does not mean that access is safe to sink.
Given:
```
for i
load a[i]
store a[i]
```
The access corresponding to the load can be optimized to outside the loop, and the load can be hoisted. But it is incorrect to sink it.
In order to sink the load, we'd need to check no Def clobbers the Use in the same iteration. With this patch we currently restrict sinking to either
Defs not existing in the loop, or Defs preceding the load in the same block. An easy extension is to ensure the load (Use) post-dominates all Defs.
Caught by PR42294.
This issue also shed light on the converse problem: hoisting stores in this same scenario would be illegal. With this patch we restrict
hoisting of stores to the case when their corresponding Defs are dominating all Uses in the loop.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63582
llvm-svn: 363982
I can't actually come up with a test case this triggers on without an out of tree change, but in theory, it's a bug in the recently added multiple exit LFTR support. The root issue is that an exiting block common to two loops can (in theory) have computable exit counts for both loops. Rewriting the exit of an inner loop in terms of the outer loops IV would cause the inner loop to either a) run forever, or b) terminate on the first iteration.
In practice, we appear to get lucky and not have the exit count computable for the outer loop, except when it's trivially zero. Given we bail on zero exit counts, we don't appear to ever trigger this. But I can't come up with a reason we *can't* compute an exit count for the outer loop on the common exiting block, so this may very well be triggering in some cases.
llvm-svn: 363964
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).
This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314
Alive proof:
https://rise4fun.com/Alive/9kG
Name: is power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp eq i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp eq i32 %a2, 0
Name: is not power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp ne i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp ne i32 %a2, 0
llvm-svn: 363956
Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop.
This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.)
I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred.
I do want to note that I added one bit after the review. When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one. This was theoretically possible with a single exit, but the zero case covered it in practice. With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached. Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done. The solution seemed obvious, so I didn't bother with another round of review.
Differential Revision: https://reviews.llvm.org/D62625
llvm-svn: 363883
(Recommit of r363293 which was reverted when a dependent patch was.)
As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop. A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here.
llvm-svn: 363875
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
> llvm-svn: 363046
llvm-svn: 363786
[SROA] Enhance SROA to handle `addrspacecast`ed allocas
- Fix typo in original change
- Add additional handling to ensure all return pointers are properly
casted.
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
adjusting of storage pointer (from `alloca) needs to handle the
potential different address spaces between the storage pointer (from
alloca) and the pointer being used.
Reviewers: arsenm
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63501
llvm-svn: 363743
Summary:
CoroSplit depends on CallGraphWrapperPass, but it was not explicitly adding it as a pass dependency.
This missing dependency can trigger errors / assertions / crashes in PMTopLevelManager::schedulePass() under certain configurations.
Author: ben-clayton
Reviewers: GorNishanov
Reviewed By: GorNishanov
Subscribers: capn, EricWF, modocache, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63144
llvm-svn: 363727
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
adjusting of storage pointer (from `alloca) needs to handle the
potential different address spaces between the storage pointer (from
alloca) and the pointer being used.
Reviewers: arsenm
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63501
llvm-svn: 363711
Using the new SwitchInstProfUpdateWrapper this patch
simplifies 3 places of prof branch_weights handling.
Differential Revision: https://reviews.llvm.org/D62123
llvm-svn: 363652
This saves roughly 32 bytes of instructions per function with stack objects
and causes us to preserve enough information that we can recover the original
tags of all stack variables.
Now that stack tags are deterministic, we no longer need to pass
-hwasan-generate-tags-with-calls during check-hwasan. This also means that
the new stack tag generation mechanism is exercised by check-hwasan.
Differential Revision: https://reviews.llvm.org/D63360
llvm-svn: 363636
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.
In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.
Differential Revision: https://reviews.llvm.org/D63119
llvm-svn: 363635
This patch really contains two pieces:
Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.
Differential Revision: https://reviews.llvm.org/D63224
llvm-svn: 363619
Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601.
Original commit comment follows:
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)
Differential Revision: https://reviews.llvm.org/D62939
llvm-svn: 363613
Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value. This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.
Fixes PR 42280.
Reviewers: spatel, efriedma, nikic, fhahn, uabelho
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63349
llvm-svn: 363598
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
This fixes https://llvm.org/PR40759
Differential Revision: https://reviews.llvm.org/D61764
llvm-svn: 363581
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338
llvm-svn: 363566
If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.
llvm-svn: 363562
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.
Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.
Reviewers: fhahn, Ayal, craig.topper
Reviewed By: fhahn
Subscribers: hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63386
llvm-svn: 363547
Third time's the charm.
This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.
llvm-svn: 363529
SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.
Differential Revision: https://reviews.llvm.org/D62186
llvm-svn: 363527
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.
Differential Revision: https://reviews.llvm.org/D62792
llvm-svn: 363489
with 'objc_arc_inert'
Those calls are no-ops, so they can be safely deleted.
rdar://problem/49839633
Differential Revision: https://reviews.llvm.org/D62433
llvm-svn: 363468
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.
This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.
For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.
llvm-svn: 363462
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
Differential Revision: https://reviews.llvm.org/D61151
llvm-svn: 363422
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
llvm-svn: 363406
As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop. A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here.
llvm-svn: 363293
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)
Differential Revision: https://reviews.llvm.org/D62939
llvm-svn: 363289
Add an AssumptionCache callback to the InlineFuntionInfo used for the
AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate
llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by
default.
Differential Revision: https://reviews.llvm.org/D63170
llvm-svn: 363287
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`. Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes. This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.
This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.
It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again. A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.
Reviewers: spatel, nikic
Reviewed By: spatel, nikic
Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62644
llvm-svn: 363274
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.
In for example:
def int_something : Intrinsic<[LLVMPointerToElt<0>],
[llvm_anyvector_ty], []>;
LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:
declare i32* @llvm.something.v4i32(<4 x i32>);
declare i64* @llvm.something.v2i64(<2 x i64>);
where the result pointer type is deduced from the element type of the
first argument.
If the returned pointer is not a pointer to the element type, LLVM will
give an error:
Intrinsic has incorrect return type!
i64* (<4 x i32>)* @llvm.something.v4i32
Reviewers: RKSimon, arsenm, rnk, greened
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D62995
llvm-svn: 363233
This reverts 363226 and 363227, both NFC intended
I swear I fixed the test case that is failing, and ran
the tests, but I will look into it again.
llvm-svn: 363229
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
Differential Revision: https://reviews.llvm.org/D61151
llvm-svn: 363227
We have observed some failures with internal builds with this revision.
- Performance regressions:
- llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings).
- Benchmarks for Abseil's SwissTable.
- Correctness:
- Failures for particular libicu tests when building the Google AppEngine SDK (for PHP).
hwennborg has already been notified, and is aware of reproducer failures.
llvm-svn: 363220
This changes the standalone pass only. Arguably the utility class
itself should assert there are no convergent calls. However, a target
pass with additional context may still be able to version a loop if
all of the dynamic conditions are sufficiently uniform.
llvm-svn: 363165
This case is slightly tricky, because loop distribution should be
allowed in some cases, and not others. As long as runtime dependency
checks don't need to be introduced, this should be OK. This is further
complicated by the fact that LoopDistribute partially ignores if LAA
says that vectorization is safe, and then does its own runtime pointer
legality checks.
Note this pass still does not handle noduplicate correctly, as this
should always be forbidden with it. I'm not going to bother trying to
fix it, as it would require more effort and I think noduplicate should
be removed.
https://reviews.llvm.org/D62607
llvm-svn: 363160
We were only matching RHS being a loop invariant value, not the inverse. Since there's nothing which appears to canonicalize loop invariant values to RHS, this means we missed cases.
Differential Revision: https://reviews.llvm.org/D63112
llvm-svn: 363108
Summary:
The method `getLoopPassPreservedAnalyses` should not mark MemorySSA as
preserved, because it's being called in a lot of passes that do not
preserve MemorySSA.
Instead, mark the MemorySSA analysis as preserved by each pass that does
preserve it.
These changes only affect the new pass mananger.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62536
llvm-svn: 363091
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 363046
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60261
llvm-svn: 363035
As shown in PR41279, some basic blocks (such as catchswitch) cannot be
instrumented. This patch filters out these BBs in PGO instrumentation.
It also sets the profile count to the fail-to-instrument edge, so that we
can propagate the counts in the CFG.
Differential Revision: https://reviews.llvm.org/D62700
llvm-svn: 362995
This was discussed as part of D62880. The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening. Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion. This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable.
For the nestedIV example from elim-extend.ll, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
Note that before is an i32 type, and the after is an i64. Truncating the i64 produces the i32.
llvm-svn: 362975
This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context.
Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening. For the nestedIV example from elim-extend, we end up with the following BE counts:
BEFORE: (-2 + (-1 * %innercount) + %limit)
AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>)
This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :)
In review, we decided to accept that test change. This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change. (I wanted it separate for problem attribution purposes.)
Differential Revision: https://reviews.llvm.org/D62880
llvm-svn: 362971
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.
llvm-svn: 362943
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.
This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
llvm-svn: 362909
Summary: Move some code around, in preparation for later fixes
to the non-integral addrspace handling (D59661)
Patch By Jameson Nash <jameson@juliacomputing.com>
Reviewed By: reames, loladiro
Differential Revision: https://reviews.llvm.org/D59729
llvm-svn: 362853
Summary:
The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed.
Add all insert edges then all delete edges in DTU to match the previous compile time.
Compile time on the test provided by @mstorsjo before and after this patch on my machine:
113.046s vs 35.649s
Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c.
Reviewers: kuhar, NutshellySima, mstorsjo
Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62981
llvm-svn: 362839
A function for loop vectorization illegality reporting has been
introduced:
void LoopVectorizationLegality::reportVectorizationFailure(
const StringRef DebugMsg, const StringRef OREMsg,
const StringRef ORETag, Instruction * const I) const;
The function prints a debug message when the debug for the compilation
unit is enabled as well as invokes the optimization report emitter to
generate a message with a specified tag. The function doesn't cover any
complicated logic when a custom lambda should be passed to the emitter,
only generating a message with a tag is supported.
The function always prints the instruction `I` after the debug message
whenever the instruction is specified, otherwise the debug message
ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.'
Patch by Pavel Samolysov <samolisov@gmail.com>
llvm-svn: 362736
This is a really silly bug that even a simple test w/an unconditional latch would have caught. I tried to guard against the case, but put it in the wrong if check. Oops.
llvm-svn: 362727
Summary:
This change only unifies the API previous API pair accepting
CallInst and InvokeInst, thus making it easier to refactor
inliner pass ode to CallBase. The implementation of the unified
API still relies on the CallSite implementation.
Reviewers: eraman, chandlerc, jdoerfert
Reviewed By: jdoerfert
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62283
llvm-svn: 362656
The AllConstant check needs to be moved out of the if/else if chain to
avoid a test regression. The "there is no SimplifyZExt" comment
puzzles me, since there is SimplifyCastInst. Additionally, the
Simplify* calls seem to not see the operand as constant, so this needs
to be tried if the simplify failed.
llvm-svn: 362653
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.
llvm-svn: 362643
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D62427
llvm-svn: 362613
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.
The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)
Differential Revision: https://reviews.llvm.org/D62272
llvm-svn: 362612
@jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon.
llvm-svn: 362592
This reverts commit 5b32f60ec3.
The fix is in commit 4f9e68148b.
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126
llvm-svn: 362583
NOTE: Note that no attributes are derived yet. This patch will not go in
alone but only with others that derive attributes. The framework is
split for review purposes.
This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.
In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.
Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.
Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames
Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59918
llvm-svn: 362578
Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.
Reviewers: chandlerc, dmgreen
Subscribers: jlebar, javed.absar, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62751
llvm-svn: 362538
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.
Differential Revision: https://reviews.llvm.org/D62560
llvm-svn: 362427
Extract a willNotOverflow() helper function that is shared between
eliminateOverflowIntrinsic() and strengthenOverflowingOperation().
Use WithOverflowInst for the former.
We'll be able to reuse the same code for saturating intrinsics as
well.
llvm-svn: 362305
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.
When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this,
we clear any nowrap flag that SCEV cannot prove for the post-inc
addrec.
Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)
Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.
Differential Revision: https://reviews.llvm.org/D60935
llvm-svn: 362292
At the moment, LoopPredication completely bails out if it sees a latch of the form:
%cmp = icmp ne %iv, %N
br i1 %cmp, label %loop, label %exit
OR
%cmp = icmp ne %iv.next, %NPlus1
br i1 %cmp, label %loop, label %exit
This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can.
For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially.
For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem.
This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later.
Differential Revision: https://reviews.llvm.org/D62748
llvm-svn: 362282
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.
rdar://50797197
Differential revision: https://reviews.llvm.org/D62358
llvm-svn: 362272
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.
Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362263
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.
llvm-svn: 362261
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362242
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Previously, this used a statement like this:
Map[A] = Map[B];
This is equivalent to the following:
const auto &Src = Map[B];
auto &Dest = Map[A];
Dest = Src;
The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.
This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)
With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.
This fixes PR42065.
Differential Revision: https://reviews.llvm.org/D62624
llvm-svn: 362150
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
VPlan.h already contains the declaration of VPlanPtr type alias:
using VPlanPtr = std::unique_ptr<VPlan>;
The LoopVectorizationPlanner class also contains the same declaration
of VPlanPtr and therefore LoopVectorize requires a long wording when
its methods return VPlanPtr:
LoopVectorizationPlanner::VPlanPtr
LoopVectorizationPlanner::buildVPlanWithVPRecipes(...)
but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h)
and can use VPlanPtr from that header.
Patch by Pavel Samolysov.
Reviewers: hsaito, rengolin, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D62576
llvm-svn: 362126
Summary:
I'm adding ORE to memset/memcpy formation, with tests,
but mainly this is split off from D61144.
Reviewers: reames, anemet, thegameg, craig.topper
Reviewed By: thegameg
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62631
llvm-svn: 362092
Currently, only the following information is provided by LoopVectorizer
in the case when the CF of the loop is not legal for vectorization:
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.
But this information is not enough for the root cause analysis; what is
exactly wrong with the loop should also be printed:
LV: Not vectorizing: The exiting block is not the loop latch.
Patch by Pavel Samolysov.
Reviewers: mkuper, hsaito, rengolin, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D62311
llvm-svn: 362056
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.
Differential Revision: https://reviews.llvm.org/D62544
llvm-svn: 362006
Summary:
When we import an alias, we do so by making a clone of the aliasee. Just
as this clone uses the original alias name and linkage, it should also
use the same visibility (not the aliasee's visibility). Otherwise,
linker behavior is affected (e.g. if the aliasee was hidden, but the
alias is not, the resulting imported clone should not be hidden,
otherwise the linker will make the final symbol hidden which is
incorrect).
Reviewers: wmi
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62535
llvm-svn: 361989
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.
Differential Revision: https://reviews.llvm.org/D62439
llvm-svn: 361882
This reverts commit 53f2f32865.
As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.
llvm-svn: 361881
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).
Differential Revision: https://reviews.llvm.org/D62463
llvm-svn: 361858
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 361811
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126
llvm-svn: 361808
The code to preserve LCSSA PHIs currently only properly supports
reduction PHIs and PHIs for values defined outside the latches.
This patch improves the LCSSA PHI handling to cover PHIs for values
defined in the latches.
Fixes PR41725.
Reviewers: efriedma, mcrosier, davide, jdoerfert
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61576
llvm-svn: 361743
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
llvm-svn: 361727
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
llvm-svn: 361726
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.
To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.
llvm-svn: 361698
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Differential Revision: https://reviews.llvm.org/D62198
llvm-svn: 361610
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.
This solves problems seen in
https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.
The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=41949
Reviewers: spatel, efriedma, fhahn
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62250
llvm-svn: 361605
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
llvm-svn: 361559
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.
The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61955
llvm-svn: 361537
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
llvm-svn: 361533
Summary:
This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.
/// Example:
/// for (int i = lb; i < ub; i+=step)
/// <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
/// guardcmp = (lb < ub)
/// if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
/// i1 = phi[{lb, preheader}, {i2, latch}]
/// <loop body>
/// i2 = i1 + step
/// latch:
/// cmp = (i2 < ub)
/// if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
/// getInitialIVValue --> lb
/// getStepInst --> i2 = i1 + step
/// getStepValue --> step
/// getFinalIVValue --> ub
/// getCanonicalPredicate --> '<'
/// getDirection --> Increasing
/// getGuard --> if (guardcmp) goto loop; else goto afterloop
/// getInductionVariable --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical --> false
Committed on behalf of @Whitney (Whitney Tsang).
Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60565
llvm-svn: 361517
`fadd` and `fsub` have recently (r351850) been added as `atomicrmw`
operations. This diff adds lowering cases for them to the LowerAtomic
transform.
Patch by Josh Berdine!
llvm-svn: 361512
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.
basically this allows the array in following C code to be optimized out
struct Expr {
int a[2];
int b;
};
static struct Expr e;
int foo (int i)
{
e.b = 2;
e.a[i] = 1;
return e.b;
}
Reviewers: greened, bkramer, nicholas, jmolloy
Reviewed By: jmolloy
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61911
llvm-svn: 361460
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
Summary: Avoid visiting an instruction more than once by using a map.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62262
llvm-svn: 361416
The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero.
llvm-svn: 361387
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.
Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rAhttps://bugs.llvm.org/show_bug.cgi?id=16739
And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.
Differential Revision: https://reviews.llvm.org/D62024
llvm-svn: 361338
Summary:
Because the sort order was not strongly stable on the RHS, whether the
chain could merge would depend on the order of the blocks in the Phi.
EXPENSIVE_CHECKS would shuffle the blocks before sorting, resulting in
non-deterministic merging.
Reviewers: gchatelet
Subscribers: hiraditya, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62193
llvm-svn: 361281
And handle for self-move. This is required so that llvm::sort can work
with EXPENSIVE_CHECKS, as it will do a random shuffle of the input
which can result in self-moves.
llvm-svn: 361257
This reverts commit rr360902. It caused an assertion failure in
lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <=
FragmentSizeInBits) && "new fragment outside of original fragment"'
failed.
PR41931.
llvm-svn: 361246
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).
Differential Revision: https://reviews.llvm.org/D61693
llvm-svn: 361188
This reverts commit 95805bc425.
I've squashed the test fix into this commit.
[DebugInfo] Update loop metadata for inlined loops
Currently, when a loop is cloned while inlining function (A) into function (B)
the loop metadata is copied and then not modified at all. The loop metadata can
encode the loop's start and end DILocations. Therefore, the new inlined loop in
function (B) may have loop metadata which shows start and end locations residing
in function (A).
This patch ensures loop metadata is updated while inlining so that the start and
end DILocations are given the "inlinedAt" operand. I've also added a regression
test for this.
This fix is required for D60831 because that patch uses loop metadata to
determine the DILocation for the branches of new loop preheaders.
Reviewers: aprantl, dblaikie, anemet
Reviewed By: aprantl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D61933
llvm-svn: 361149
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Summary:
Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A).
This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this.
This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders.
Reviewers: aprantl, dblaikie, anemet
Reviewed By: aprantl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D61933
llvm-svn: 361132
This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61795
llvm-svn: 361110
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.
Reviewers: spatel, craig.topper, efriedma, majnemer
Reviewed By: spatel
Subscribers: llvm-commits, craig.topper
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61938
llvm-svn: 361043
With a fix for PR41917: The predecessor list was changing under our feet.
- for (BasicBlock *Pred : predecessors(EntryBlock_)) {
+ while (!pred_empty(EntryBlock_)) {
+ BasicBlock* const Pred = *pred_begin(EntryBlock_);
llvm-svn: 361009
Using dominance vs a set membership check is indistinguishable from a compile time perspective, and the two queries return equivelent results. Simplify code by using the existing function.
llvm-svn: 360976
Summary:
Adds a call to __hwasan_handle_vfork(SP) at each landingpad entry.
Reusing __hwasan_handle_vfork instead of introducing a new runtime call
in order to be ABI-compatible with old runtime library.
Reviewers: pcc
Subscribers: kubamracek, hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D61968
llvm-svn: 360959
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop casts
impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is
now possible, and so can be used to fix the attached bug as well as any cases
where SExt instruction results are lost in the debugging metadata. This patch
introduces this fix by expanding the salvage debug info method to cover these
cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360902
Summary: We should excluded unreachable operands from processing as their DFS visitation order is undefined. When `renameUses` function sorts `OpsToRename` (https://fburl.com/d2wubn60), the comparator assumes that the parent block of the operand has a corresponding dominator tree node. This is not the case for unreachable operands and crashes the compiler.
Reviewers: dberlin, mgrang, davide
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61154
llvm-svn: 360796
Summary:
The return value of a TryToUnfoldSelect call was not checked, which led to an
incorrectly preserved loop info and some crash.
The original crash was reported on https://reviews.llvm.org/D59514.
Reviewers: davidxl, amehsan
Reviewed By: davidxl
Subscribers: fhahn, brzycki, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61920
llvm-svn: 360780
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. With the recent addition of DW_OP_LLVM_convert this
salvaging is now possible, and so can be used to fix the attached bug as
well as any cases where SExt instruction results are lost in the
debugging metadata. This patch introduces this fix by expanding the
salvage debug info method to cover these cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360772
Instead of patching the original blocks, we now generate new blocks and
delete the old blocks. This results in simpler code with a less twisted
control flow (see the change in `entry-block-shuffled.ll`).
This will make https://reviews.llvm.org/D60318 simpler by making it more
obvious where control flow created and deleted.
Reviewers: gchatelet
Subscribers: hiraditya, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61736
llvm-svn: 360771
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.
As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.
Reviewers: Ayal, hsaito, dcaballe, rengolin
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D61638
llvm-svn: 360758
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0
For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.
For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).
Reviewed By: rnk, dexonsmith
Differential Revision: https://reviews.llvm.org/D61547
llvm-svn: 360742
Port hardware assisted address sanitizer to new PM following the same guidelines as msan and tsan.
Changes:
- Separate HWAddressSanitizer into a pass class and a sanitizer class.
- Create new PM wrapper pass for the sanitizer class.
- Use the getOrINsert pattern for some module level initialization declarations.
- Also enable kernel-kwasan in new PM
- Update llvm tests and add clang test.
Differential Revision: https://reviews.llvm.org/D61709
llvm-svn: 360707
When an outer loop gets deleted by a different pass, before LICM visits
it, we cannot clean up its sub-loops in AliasSetMap, because at the
point we receive the deleteAnalysisLoop callback for the outer loop, the loop
object is already invalid and we cannot access its sub-loops any longer.
Reviewers: asbirlea, sanjoy, chandlerc
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D61904
llvm-svn: 360704
LoopSimplify can preserve MemorySSA after r360270.
But the MemorySSA analysis is retrieved and preserved only when the
EnableMSSALoopDependency is set to true. Use the same conditional to
mark the pass as preserved, otherwise subsequent passes will get an
invalid analysis.
Resolves PR41853.
llvm-svn: 360697
Some atomic loads are implemented as cmpxchg (particularly if large or
floating), and that usually requires write access to the memory involved
or it will segfault.
We can still propagate the constant value to users we understand though.
llvm-svn: 360662
Summary:
CoroFrame was not considering static array allocas, and was only ever reserving a single element in the coroutine frame.
This meant that stores to the non-zero'th element would corrupt later frame data.
Store static array allocas as field arrays in the coroutine frame.
Added test.
Committed by Gor Nishanov on behalf of ben-clayton
Reviewers: GorNishanov, modocache
Reviewed By: GorNishanov
Subscribers: Orlando, capn, EricWF, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61372
llvm-svn: 360636
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.
Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):
%shamt = zext i8 %i to i32
%m = and i32 %shamt, 31
%neg = sub i32 0, %shamt
%and4 = and i32 %neg, 31
%shl = shl i32 %v, %m
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
=>
%a = and i8 %i, 31
%shamt2 = zext i8 %a to i32
%neg2 = sub i32 0, %shamt2
%and4 = and i32 %neg2, 31
%shl = shl i32 %v, %shamt2
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
https://rise4fun.com/Alive/V9r
llvm-svn: 360605
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61706
llvm-svn: 360456
The original change introduced a depth limit of 7 which caused a 22% regression
in the Swift MapReduceLazyCollection & Ackermann benchmarks. This new threshold
still ensures that the original test case doesn't hang.
rdar://50359639
llvm-svn: 360444
Summary:
- Constant expressions may not be added in strict postorder as the
forward instruction scan order. Thus, for a constant express (CE0), if
its operand (CE1) is used in an previous instruction, they are not in
postorder. However, different from
`cloneInstructionWithNewAddressSpace`,
`cloneConstantExprWithNewAddressSpace` doesn't bookkeep uninferred
instructions for later resolving. That results in failure of inferring
constant address.
- This patch adds the support to infer constant expression operand
recursively, since there won't be loop, if that operand is another
constant expression.
Reviewers: arsenm
Subscribers: jholewinski, jvesely, wdng, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61760
llvm-svn: 360431
In certain circumstances, optimizations pick line numbers from debug
intrinsic instructions as the new location for altered instructions. This
is problematic because the line number of a debugging intrinsic is
meaningless (it doesn't produce any machine instruction), only the scope
information is valid. The result can be the line number of a variable
declaration "leaking" into real code from debugging intrinsics, making the
line table un-necessarily jumpy, and potentially different with / without
variable locations.
Fix this by using zero line numbers when promoting dbg.declare intrinsics
into dbg.values: this is safe for debug intrinsics as their line numbers
are meaningless, and reduces the scope for damage / misleading stepping
when optimizations pick locations from the wrong place.
Differential Revision: https://reviews.llvm.org/D59272
llvm-svn: 360415
Summary:
Seeing some issues for windows debug pathological cases with collectBitParts
recursion (1525 levels of recursion!)
Setting the limit to 64 as this should be sufficient - passes all lit cases
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61728
Change-Id: I7f44cdc6c1badf1c2ccbf1b0c4b6afe27ecb39a1
llvm-svn: 360347
The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements.
My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement.
Differential Revision: https://reviews.llvm.org/D61695
llvm-svn: 360284
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.
Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60833
llvm-svn: 360270
Reassociation's NegateValue moved instructions to the beginning of
blocks (after PHIs) without checking for exception handling pads.
It's possible for reassociation to move something into an exception
handling block so we need to make sure we don't move things too early
in the block. This change advances the insertion point past any
exception handling pads.
If the block we want to move into contains a catchswitch, we cannot
move into it. In that case just create a new neg as if we had not
found an existing neg to move.
Differential Revision: https://reviews.llvm.org/D61089
llvm-svn: 360262
If we fold a branch/switch to an unconditional branch to another dead block we
replace the branch with unreachable, to avoid attempting to fold the
unconditional branch.
Reviewers: davide, efriedma, mssimpso, jdoerfert
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61300
llvm-svn: 360232
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)
I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV
This transform enables the following transform that already exists in
instcombine:
(X | Y) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)
There already exists the transform in the sub case:
(X | Y) - Y --> X & ~Y
However this does not trigger in the case where Y is constant due to an earlier
transform:
X - (-C) --> X + C
With this new add fold, both the add and sub constant cases are handled.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61517
llvm-svn: 360185
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.
llvm-svn: 360180
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel
Reviewed By: hfinkel
Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 360162
Fixes the main issue in PR41693
When both modes are used, two functions are created:
`sancov.module_ctor`, `sancov.module_ctor.$LastUnique`, where
$LastUnique is the current LastUnique counter that may be different in
another module.
`sancov.module_ctor.$LastUnique` belongs to the comdat group of the same
name (due to the non-null third field of the ctor in llvm.global_ctors).
COMDAT group section [ 9] `.group' [sancov.module_ctor] contains 6 sections:
[Index] Name
[ 10] .text.sancov.module_ctor
[ 11] .rela.text.sancov.module_ctor
[ 12] .text.sancov.module_ctor.6
[ 13] .rela.text.sancov.module_ctor.6
[ 23] .init_array.2
[ 24] .rela.init_array.2
# 2 problems:
# 1) If sancov.module_ctor in this module is discarded, this group
# has a relocation to a discarded section. ld.bfd and gold will
# error. (Another issue: it is silently accepted by lld)
# 2) The comdat group has an unstable name that may be different in
# another translation unit. Even if the linker allows the dangling relocation
# (with --noinhibit-exec), there will be many undesired .init_array entries
COMDAT group section [ 25] `.group' [sancov.module_ctor.6] contains 2 sections:
[Index] Name
[ 26] .init_array.2
[ 27] .rela.init_array.2
By using different module ctor names, the associated comdat group names
will also be different and thus stable across modules.
Reviewed By: morehouse, phosek
Differential Revision: https://reviews.llvm.org/D61510
llvm-svn: 360107
This reverts r357452 (git commit 21eb771dcb).
This was causing strange optimization-related test failures on an internal test. Will followup with more details offline.
llvm-svn: 360086
We don't always get this:
Cond ? -X : -Y --> -(Cond ? X : Y)
...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.
Differential Revision: https://reviews.llvm.org/D61604
llvm-svn: 360075
Properly initialize store type to null then ensure we find a real store type in the chain.
Fixes scan-build null dereference warning and makes the code clearer.
llvm-svn: 360031
Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert
DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian
targets for same reasons as in D59687.
Differential Revision: https://reviews.llvm.org/D60611
llvm-svn: 360013
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61013
llvm-svn: 359996
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61011
llvm-svn: 359995
Summary:
There is `Instruction::getNumSuccessors()`, `Instruction::getSuccessor()`
and `Instruction::setSuccessor()`, but no function to replace every
specified `BasicBlock*` successor with some other specified `BasicBlock*`.
I've found one place where it should clearly be used.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61010
llvm-svn: 359994
Summary:
If `deleteDeadLoop()` is called on such a loop, that has "bad" exit block,
one that e.g. has no terminator instruction, the `DIBuilder::insertDbgValueIntrinsic()`
will be told to insert the Dbg Value Intrinsic after `nullptr`
(since there is no first non-PHI instruction), which will cause it to not insert
those instructions into any basic block. The instructions will be parent-less,
and IR verifier will complain. It is rather obvious to track down the root cause
when that happens, so let's just assert it never happens.
Reviewers: sanjoy, davide, vsk
Reviewed By: vsk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61008
llvm-svn: 359993
Summary:
Inalloca parameters require special handling in some optimizations.
This change causes globalopt to strip the inalloca attribute from
function parameters when it is safe to do so, removes the special
handling for inallocas from argpromotion, and replaces it with a
simple check that causes argpromotion to skip functions that receive
inallocas (for when the pass is invoked on code that didn't run
through globalopt first). This also avoids a case where argpromotion
would incorrectly try to pass an inalloca in a register.
Fixes PR41658.
Reviewers: rnk, efriedma
Reviewed By: rnk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61286
llvm-svn: 359743
Summary: Fix a transformation bug where two scopes share a common instrution to hoist.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61405
llvm-svn: 359736
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.
This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.
llvm-svn: 359633
Summary:
Match NewPassManager behavior: add option for interleaved loops in the
old pass manager, and use that instead of the flag used to disable loop unroll.
No changes in the defaults.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61030
llvm-svn: 359615
Summary:
When a variable goes into scope several times within a single function
or when two variables from different scopes share a stack slot it may
be incorrect to poison such scoped locals at the beginning of the
function.
In the former case it may lead to false negatives (see
https://github.com/google/sanitizers/issues/590), in the latter - to
incorrect reports (because only one origin remains on the stack).
If Clang emits lifetime intrinsics for such scoped variables we insert
code poisoning them after each call to llvm.lifetime.start().
If for a certain intrinsic we fail to find a corresponding alloca, we
fall back to poisoning allocas for the whole function, as it's now
impossible to tell which alloca was missed.
The new instrumentation may slow down hot loops containing local
variables with lifetime intrinsics, so we allow disabling it with
-mllvm -msan-handle-lifetime-intrinsics=false.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60617
llvm-svn: 359536
Follow-up to:
rL359482
Avoid this potential problem throughout by giving the type a name
and verifying the assumption that both operands are the same type.
llvm-svn: 359485
PVS Studio's copy+paste recognizer was seeing this as a typo, technically Op0/Op1 in a fcmp should always be the same type, but we might as well avoid the issue.
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359482
This change aims at making the file format be compatible with the
way LLVM handles command line options.
Differential Revision: https://reviews.llvm.org/D60970
llvm-svn: 359462
This enables the pass to be used in the absence of
TargetTransformInfo. When the argument isn't passed, the factory
defaults to UninitializedAddressSpace and the flat address space is
obtained from the TargetTransformInfo as before this change. Existing
users won't have to change.
Patch by Kevin Petit.
Differential Revision: https://reviews.llvm.org/D60602
llvm-svn: 359290
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.
This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.
llvm-svn: 359253
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
llvm-svn: 359236
it keeps track of becomes too large
ARC optimizer does a top-down and a bottom-up traversal of the whole
function to pair up retain and release instructions and remove them.
This can be expensive if the number of instructions in the function and
pointer states it tracks are large since it has to look at each pointer
state and determine whether the instruction being visited can
potentially use the pointer.
This patch adds a command line option that sets a limit to the number of
pointers it tracks.
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D61100
llvm-svn: 359226
When evaluating a store through a bitcast, the evaluator tries to move the
bitcast from the pointer onto the stored value. If the cast is invalid, it
tries to "introspect" the type to get a valid cast by obtaining a pointer to
the initial element (if the type is nested, this may require walking several
initial elements).
In some situations it is possible to get a bitcast on a load (e.g. with
unions, where the bitcast may not be the same type as the store). However,
equivalent logic to the store to introspect the type is missing. This patch
add this logic.
Note, when developing the patch I was unhappy with adding similar logic
directly to the load case as it could get out of step. Instead, I have
abstracted the "introspection" into a helper function, with the specifics
being handled by a passed-in lambda function.
Differential Revision: https://reviews.llvm.org/D60793
llvm-svn: 359205
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.
Reviewers: chandlerc, jgorbe
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61091
llvm-svn: 359167
Do not wrap the contents of printFusionCandidates in the LLVM_DEBUG macro. This
fixes an unused variable warning generated when compiling without asserts but
with -DENABLE_LLVM_DUMP.
Differential Revision: https://reviews.llvm.org/D61035
llvm-svn: 359161
Summary: The code did not check if operand was undef before casting it to Instruction.
Reviewers: RKSimon, ABataev, dtemirbulatov
Reviewed By: ABataev
Subscribers: uabelho
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61024
llvm-svn: 359136
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.
This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).
llvm-svn: 359111
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Summary:
Enabling MemorySSA in the old pass manager leads to MemorySSA being run
twice due to the fact that LCSSA and LoopSimplify do not preserve
MemorySSA. This is the first step to address that: target LCSSA.
LCSSA does not make any changes that invalidate MemorySSA, so it
preserves it by design. It must preserve AA as well, for this to hold.
After this patch, MemorySSA is still run twice in the old pass manager.
Step two follows: target LoopSimplify.
Subscribers: mehdi_amini, jlebar, Prazek, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60832
llvm-svn: 359032
DominatorTree::dominate.
ARC contract pass has an optimization that replaces the uses of the
argument of an ObjC runtime function call with the call result.
For example:
; Before optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %1, i8** @g0, align 8
; After optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %2, i8** @g0, align 8 // %1 is replaced with %2
Before replacing the argument use, DominatorTree::dominate is called to
determine whether the user instruction is dominated by the ObjC runtime
function call instruction. The call to DominatorTree::dominate can be
expensive if the two instructions belong to the same basic block and the
size of the basic block is large. This patch checks the basic block size
and just bails out if the size exceeds the limit set by command line
option "arc-contract-max-bb-size".
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D60900
llvm-svn: 359027
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.
Differential Revision: https://reviews.llvm.org/D59703
llvm-svn: 359000
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.
Thanks to @reames for the report and test case.
Differential Revision: https://reviews.llvm.org/D60944
llvm-svn: 358958
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.
The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.
Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.
Differential Revision: https://reviews.llvm.org/D60659
llvm-svn: 358919
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.
llvm-svn: 358913
Back in August, r340525 introduced a dependency on the assumption
cache tracker in the ipsccp pass, but that commit missed a call to
INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache
improperly registered if SCCP is the only thing that pulls it in.
llvm-svn: 358903
Currently, we do not expose BPI to loop passes at all. In the old pass manager, we appear to have been ignoring the fact that LCSSA and/or LoopSimplify didn't preserve BPI, and making it available to the following loop passes anyways. In the new one, it's invalidated before running any loop pass if either LCSSA or LoopSimplify actually make changes. If they don't make changes, then BPI is valid and available. So, we go ahead and teach LCSSA and LoopSimplify how to preserve BPI for consistency between old and new pass managers.
This patch avoids an invalidation between the two requires in the following trivial pass pipeline:
opt -passes="requires<branch-prob>,loop(no-op-loop),requires<branch-prob>"
(when the input file is one which requires either LCSSA or LoopSimplify to canonicalize the loops)
Differential Revision: https://reviews.llvm.org/D60790
llvm-svn: 358901
This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445.
After thinking about this more, this isn't right, the range is not exact
in the same sense as makeExactICmpRegion(). This needs a separate
function.
llvm-svn: 358876
Following D60632 makeGuaranteedNoWrapRegion() always returns an
exact nowrap region. Rename the function accordingly. This is in
line with the naming of makeExactICmpRegion().
llvm-svn: 358875
Summary:
Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative.
Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181)
Reviewers: sanjoy, apilipenko, nikic
Reviewed By: nikic
Subscribers: hiraditya, jfb, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60036
llvm-svn: 358816
This is a follow-up to r291037+r291258, which used null debug locations
to prevent jumpy line tables.
Using line 0 locations achieves the same effect, but works better for
crash attribution because it preserves the right inline scope.
Differential Revision: https://reviews.llvm.org/D60913
llvm-svn: 358791
Summary:
Make the flags in LICM + MemorySSA tuning options in the old and new
pass managers.
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60490
llvm-svn: 358772
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.
Reviewers: chandlerc
Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59723
llvm-svn: 358763
removeUsers uses a work list to collect indirect users and call remove()
on those functions. However it has a bug (`if (!Visited.insert(UU).second)`).
Actually, we don't have to collect indirect users.
After the merge of F and G, G's callers will be considered (added to
Deferred). If G's callers can be merged, G's callers' callers will be
considered.
Update the test unnamed-addr-reprocessing.ll to make it clear we can
still merge indirect callers.
llvm-svn: 358741
code to `CallBase`.
This patch focuses on the legacy PM, call graph, and some of inliner and legacy
passes interacting with those APIs from `CallSite` to the new `CallBase` class.
No interesting changes.
Differential Revision: https://reviews.llvm.org/D60412
llvm-svn: 358739
We would previously drop the COMDAT on the thunk we generated when replacing a
function body with the forwarding thunk. This would result in a function that
may have been multiply emitted and multiply merged to be emitted with the same
name without the COMDAT. This is a hard error with PE/COFF where the COMDAT is
used for the deduplication of Value Witness functions for Swift.
llvm-svn: 358728
Prior to this patch, each basic block listed in the extrack-blocks-file
would be extracted to a different function.
This patch adds the support for comma separated list of basic blocks
to form group.
When the region formed by a group is not extractable, e.g., not single
entry, all the blocks of that group are left untouched.
Let us see this new format in action (comments are not part of the
file format):
;; funcName bbName[,bbName...]
foo bb1 ;; Extract bb1 in its own function
foo bb2,bb3 ;; Extract bb2,bb3 in their own function
bar bb1,bb4 ;; Extract bb1,bb4 in their own function
bar bb2 ;; Extract bb2 in its own function
Assuming all regions are extractable, this will create one function and
thus one call per region.
Differential Revision: https://reviews.llvm.org/D60746
llvm-svn: 358701
The bug is that I didn't check whether the operand of the invariant_loads were themselves invariant. I don't know how this got missed in the patch and review. I even had an unreduced test case locally, and I remember handling this case, but I must have lost it in one of the rebases. Oops.
llvm-svn: 358688
The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i'
A = _a.length;
guard (i < A)
a = _a[i]
B = _b.length;
guard (i < B);
b = _b[i];
...
Z = _z.length;
guard (i < Z)
z = _z[i]
accum += a + b + ... + z;
Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form.
Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later.
As a secondary benefit, this allows LoopPred to expose peeling oppurtunities in a much more obvious manner. See the udiv test changes as an example. If the udiv was not hoistable (i.e. we couldn't prove speculation safety) this would be an example where peeling becomes obviously profitable whereas it wasn't before.
A couple of subtleties in the implementation:
- SCEV's isSafeToExpand guarantees speculation safety (i.e. let's us expand at a new point). It is not a precondition for expansion if we know the SCEV corresponds to a Value which dominates the requested expansion point.
- SCEV's isLoopInvariant returns true for expressions which compute the same value across all iterations executed, regardless of where the original Value is located. (i.e. it can be in the loop) This implies we have a speculation burden to prove before expanding them outside loops.
- invariant_loads and AA->pointsToConstantMemory are two cases that SCEV currently does not handle, but meets the SCEV definition of invariance. I plan to sink this part into SCEV once this has baked for a bit.
Differential Revision: https://reviews.llvm.org/D60093
llvm-svn: 358684
Reverse the checking of the domiance order so that when a self compare happens,
it returns false. This makes compare function have strict weak ordering.
llvm-svn: 358636
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
1. Adjacent (no code between them)
2. Control flow equivalent (if one loop executes, the other loop executes)
3. Identical bounds (both loops iterate the same number of iterations)
4. No negative distance dependencies between the loop bodies.
The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.
The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.
Differential Revision: https://reviews.llvm.org/D55851
llvm-svn: 358607
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable. We can't actually insert the unreachable instruction since that would require changing the CFG. We leave that to simplifycfg later.
This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.
llvm-svn: 358600
Summary:
In the following cases, unrolling can be beneficial, even when
optimizing for code size:
1) very low trip counts
2) potential to constant fold most instructions after fully unrolling.
We can unroll in those cases, by setting the unrolling threshold to the
loop size. This might highlight some cost modeling issues and fixing
them will have a positive impact in general.
Reviewers: vsk, efriedma, dmgreen, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D60265
llvm-svn: 358586
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
1. Adjacent (no code between them)
2. Control flow equivalent (if one loop executes, the other loop executes)
3. Identical bounds (both loops iterate the same number of iterations)
4. No negative distance dependencies between the loop bodies.
The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.
The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.
Phabricator: https://reviews.llvm.org/D55851
llvm-svn: 358543
This is 1 of the problems discussed in the post-commit thread for:
rL355741 / http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190311/635516.html
and filed as:
https://bugs.llvm.org/show_bug.cgi?id=41101
Instcombine tries to canonicalize some of these cases (and there's room for improvement
there independently of this patch), but it can't always do that because of extra uses.
So we need to recognize these commuted operand patterns here in EarlyCSE. This is similar
to how we detect commuted compares and commuted min/max/abs.
Differential Revision: https://reviews.llvm.org/D60723
llvm-svn: 358523
If a umul.with.overflow or smul.with.overflow operation cannot
overflow, simplify it to a simple mul nuw / mul nsw. After the
refactoring in D60668 this is just a matter of removing an
explicit check against multiplications.
Differential Revision: https://reviews.llvm.org/D60791
llvm-svn: 358521
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.
This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59973
llvm-svn: 358519
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.
In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.
Patch by Shawn Landden.
Differential Revision: https://reviews.llvm.org/D60660
llvm-svn: 358515
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.
The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.
Differential Revision: https://reviews.llvm.org/D60668
llvm-svn: 358512
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 358483
If there are any intrinsics that cannot be traced back to an alloca, we
might have missed the start of a variable's scope, leading to false
error reports if the variable is poisoned at function entry. Instead, if
there are some intrinsics that can't be traced, fail safe and don't
poison the variables in that function.
Differential revision: https://reviews.llvm.org/D60686
llvm-svn: 358478
The CodeExtractor is not smart enough to compute which basic block is
the entry of a region. Instead it relies on the order of the list
of basic blocks that is handed to it and assumes that the entry
is the first block in the list.
Without the additional debug information, it is hard to understand
why a valid region does not get extracted, because we would miss
that the order of in the list just doesn't match what the CodeExtractor
wants.
NFC
llvm-svn: 358471
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.
llvm.org/PR41445
Differential Revision: https://reviews.llvm.org/D60645
Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com>
llvm-svn: 358457
This is a preparatory patch for D60093. This patch itself is NFC, but while preparing this I noticed and committed a small hoisting change in rL358419.
The basic structure of the new scheme is that we pass around the guard ("the using instruction"), and select an optimal insert point by examining operands at each construction point. This seems conceptually a bit cleaner to start with as it isolates the knowledge about insertion safety at the actual insertion point.
Note that the non-hoisting path is not actually used at the moment. That's not exercised until D60093 is rebased on this one.
Differential Revision: https://reviews.llvm.org/D60718
llvm-svn: 358434
Zexts can be treated like no-op casts when it comes to assessing whether their
removal affects debug info.
Reviewer: aprantl
Differential Revision: https://reviews.llvm.org/D60641
llvm-svn: 358431
Summary:
Enable some of the existing size optimizations for cold code under PGO.
A ~5% code size saving in big internal app under PGO.
The way it gets BFI/PSI is discussed in the RFC thread
http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html
Note it doesn't currently touch loop passes.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59514
llvm-svn: 358422
If we have multiple range checks which can be predicated, hoist the and of the results outside the loop. This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093.
llvm-svn: 358419
Summary:
Factor out findAllocaForValue() from ASan so that we can use it in
MSan to handle lifetime intrinsics.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60615
llvm-svn: 358380
* Rearrange continu/break
* BBNumbers.lookup(A) -> BBNumbers.find(A)->second
BBNumbers has been computed, thus we can assume the value exists in the predicate.
llvm-svn: 358351
As pointed out in D60518 folding mulo(%x, undef) to {undef, undef}
isn't correct. As a correct version of this already exists in
InstructionSimplify (bd8056ef32/lib/Analysis/InstructionSimplify.cpp (L4750-L4757)) this is just
dead code though. Drop it together with the mul(%x, 0) -> {0, false}
fold that is also already handled by InstSimplify.
Differential Revision: https://reviews.llvm.org/D60649
llvm-svn: 358339
Summary:
Create a method to forget everything in SCEV.
Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll.
Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch.
Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build.
Testcase showcasing this cannot be opensourced and is fairly large.
The option disabled by default, but it may be desirable to enable by
default. Evidence in favor (two difference runs on different days/ToT state):
File Before (s) After (s)
clang-9.bc 7267.91 6639.14
llvm-as.bc 194.12 194.12
llvm-dis.bc 62.50 62.50
opt.bc 1855.85 1857.53
File Before (s) After (s)
clang-9.bc 8588.70 7812.83
llvm-as.bc 196.20 194.78
llvm-dis.bc 61.55 61.97
opt.bc 1739.78 1886.26
Reviewers: sanjoy
Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60144
llvm-svn: 358304
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).
The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself. If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address. The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.
llvm-svn: 358299
When CVP determines that a with.overflow intrinsic cannot overflow,
it currently inserts a simple add/sub. As we already determined that
there can be no overflow, we should add the appropriate NUW/NSW flag.
Differential Revision: https://reviews.llvm.org/D60585
llvm-svn: 358298
Bug: https://bugs.llvm.org/show_bug.cgi?id=41175
In the bug test case the DSE pass is shortening the range of memory that a
memset is working on. A getelementptr is generated so that the new
starting address can be passed to memset. This instruction was not given
a DebugLoc.
To fix the bug, copy the DebugLoc from the memset instruction.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D60556
llvm-svn: 358270
We currently assume profile hash conflicts will be caught by an upfront
check and we assert for the cases that escape the check. The assumption
is not always true as there are chances of conflict. This patch prints
a warning and skips annotating the function for the escaped cases,.
Differential Revision: https://reviews.llvm.org/D60154
llvm-svn: 358225
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.
Differential Revision: https://reviews.llvm.org/D60518
llvm-svn: 358100
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D60061
llvm-svn: 358099
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D59952
llvm-svn: 358056
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).
Differential Revision: https://reviews.llvm.org/D60483
llvm-svn: 358052
metadata into a module flag in the auto-upgrader and make the ARC
contract pass read the marker as a module flag.
This is needed to fix a bug where ARC contract wasn't inserting the
retainRV marker when LTO was enabled, which caused objects returned
from a function to be auto-released.
rdar://problem/49464214
Differential Revision: https://reviews.llvm.org/D60303
llvm-svn: 358047
This reverts commit 1383a91689.
sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.
llvm-svn: 358026
Change the code to always handle the unsigned+signed cases together
with the same basic structure for add/sub/mul. The simple folds are
always handled first and then the ValueTracking overflow checks are
used.
llvm-svn: 358025
Similar to:
rL358005
Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.
llvm-svn: 358013
// 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow.
This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.
Differential Revision: https://reviews.llvm.org/D60426
llvm-svn: 358005
It's been on in Android for a while without causing problems, so it's time
to make it the default and remove the flag.
Differential Revision: https://reviews.llvm.org/D60355
llvm-svn: 357960
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)
(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)
llvm-svn: 357943
Fixes bug 40992: https://bugs.llvm.org/show_bug.cgi?id=40992
There is potential for miscompiled code emitted from JumpThreading when
analyzing a block with one or more indirectbr or callbr predecessors. The
ProcessThreadableEdges() function incorrectly folds conditional branches
into an unconditional branch.
This patch prevents incorrect branch folding without fully pessimizing
other potential threading opportunities through the same basic block.
This IR shape was manually fed in via opt and is unclear if clang and the
full pass pipeline will ever emit similar code shapes.
Thanks to Matthias Liedtke for the bug report and simplified IR example.
Differential Revision: https://reviews.llvm.org/D60284
llvm-svn: 357930
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:
e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32
Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW
Differential Revision: https://reviews.llvm.org/D60256
llvm-svn: 357909
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.
llvm-svn: 357667
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.
Differential revision: https://reviews.llvm.org/D59852
llvm-svn: 357638
The Emscripten OS provides a definition of __EMSCRIPTEN__, and also that it
supports iprintf optimizations.
Also define small_printf optimizations, which is a printf with float support
but not long double (which in wasm can be useful since long doubles are 128
bit and force linking of float128 emulation code). This part is based on
sunfish's https://reviews.llvm.org/D57620 (which can't land yet since
the WASI integration isn't ready yet).
Differential Revision: https://reviews.llvm.org/D60167
llvm-svn: 357552
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.
This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.
Differential Revision: https://reviews.llvm.org/D60113
llvm-svn: 357500
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180
In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.
The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.
A regression test has been added that covers these issues.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D59944
llvm-svn: 357499
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60080
llvm-svn: 357485
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.
Reviewers: danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59940
llvm-svn: 357484
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60079
llvm-svn: 357483
`StoreInst::getValueOperand` is identical to `getOperand(0)`, so the call to
`getOperand(0)` can be replaced. Further, `SI->getValueOperand` is redundantly
called just a few lines down, despite its return value being stored in variable
`DV`. No functional change.
llvm-svn: 357479
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.
That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.
Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 357452
The code doesn't actually need any of the information about the widenable condition at this level. The only thing we need is to ensure the WC call is the last thing anded in, and even that is a quirk we should really look to remove.
llvm-svn: 357448
We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.
llvm-svn: 357408
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing.
llvm-svn: 357406
Summary:
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
Reviewers: reames, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60058
llvm-svn: 357389
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.
I'll recommit with a better commit message with reference to the
phabricator review.
llvm-svn: 357387
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
llvm-svn: 357385
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:
LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2
PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.
As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.
Differential Revision: https://reviews.llvm.org/D60048
llvm-svn: 357382
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.
Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.
We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.
So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.
Differential Revision: https://reviews.llvm.org/D60016
llvm-svn: 357366
This may not be NFC, but I'm not sure how to expose any diffs in
tests. In theory, it should be slightly more efficient and possibly
more profitable to do the canonicalizations (which can increase the
undef elements in the mask) ahead of SimplifyDemandedVectorElts().
llvm-svn: 357272
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.
This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).
Differential Revision: https://reviews.llvm.org/D59992
llvm-svn: 357266
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.
This reverts commit 2b85de4383.
llvm-svn: 357257
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.
Differential Revision: https://reviews.llvm.org/D59956
llvm-svn: 357243
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.
Proof: https://rise4fun.com/Alive/Plqc
Reviewers: qcolombet, gilr, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59218
llvm-svn: 357217
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.
Fixes PR38829.
Reviewers: rnk, efriedma, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59789
llvm-svn: 357208
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D59220
llvm-svn: 357157
With this change, the VPlan native path is triggered with the directive:
#pragma clang loop vectorize(enable)
There is no need to specify the vectorize_width(N) clause.
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D57598
llvm-svn: 357156
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Differential Revision: https://reviews.llvm.org/D58872
llvm-svn: 357103
This is the last step towards solving the examples shown in:
https://bugs.llvm.org/show_bug.cgi?id=14613
With this change, x86 should end up with psubus instructions
when those are available.
All known codegen issues with expanding the saturating intrinsics
were resolved with:
D59006 / rL356855
We also have some early evidence in D58872 that using the intrinsics
will lead to better perf. If some target regresses from this, custom
lowering of the intrinsics (as in the above for x86) may be needed.
llvm-svn: 357012
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.
Differential Revision: https://reviews.llvm.org/D59784
llvm-svn: 356939
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.
This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.
Differential Revision: https://reviews.llvm.org/D59738
llvm-svn: 356913
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.
llvm-svn: 356854
This adds ConstantRange::getFull(BitWidth) and
ConstantRange::getEmpty(BitWidth) named constructors as more readable
alternatives to the current ConstantRange(BitWidth, /* full */ false)
and similar. Additionally private getFull() and getEmpty() member
functions are added which return a full/empty range with the same bit
width -- these are commonly needed inside ConstantRange.cpp.
The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet)
constructor is now mandatory for the few usages that still make use of it.
Differential Revision: https://reviews.llvm.org/D59716
llvm-svn: 356852
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
llvm-svn: 356814
Summary:
Between building the pair map and querying it there are a few places that
erase and create Values. It's rare but the address of these newly created
Values is occasionally the same as a just-erased Value that we already
have in the pair map. These coincidences should be accounted for to avoid
non-determinism.
Thanks to Roman Tereshin for the test case.
Reviewers: rtereshin, bogner
Reviewed By: rtereshin
Subscribers: mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59401
llvm-svn: 356803
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.
Differential Revision: https://reviews.llvm.org/D58906
Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa
llvm-svn: 356768
are annotated with notail.
r356705 annotated calls to objc_retainAutoreleasedReturnValue with
notail on x86-64. This commit teaches ARC optimizer to check the notail
marker on the call before turning it into a tail call.
rdar://problem/38675807
llvm-svn: 356707
If they have other users we'll just end up increasing the instruction count.
We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.
Fixes PR41164.
Differential Revision: https://reviews.llvm.org/D59630
llvm-svn: 356690
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.
Differential Revision: https://reviews.llvm.org/D57247
llvm-svn: 356590
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59564
llvm-svn: 356588
LTO provides additional opportunities for tailcall elimination due to
link-time inlining and visibility of nocapture attribute. Testing showed
negligible impact on compilation times.
Differential Revision: https://reviews.llvm.org/D58391
llvm-svn: 356511
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.
Differential Revision: https://reviews.llvm.org/D57372
llvm-svn: 356510
Combine 2 fcmps that are checking for nan-ness:
and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z
or (fcmp uno X, 0), (or (fcmp uno Y, 0), Z) --> or (fcmp uno X, Y), Z
This is an exact match for a minimal reassociation pattern.
If we want to handle this more generally that should go in
the reassociate pass and allow removing this code.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=41069
llvm-svn: 356471
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
This is a recommit of r356442 with trivial fixes for the failing tests.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356451
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356442
Follow-up to:
rL356338
rL356369
We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.
llvm-svn: 356372
Follow-up to:
rL356338
Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.
llvm-svn: 356369
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.
llvm-svn: 356338
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.
The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.
Differential Revision: https://reviews.llvm.org/D57468
llvm-svn: 356293
Summary: Add bindings to create a wrapped "Add Discriminators" pass. Now that we have debug info support, this is a handy transform to have.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: dblaikie, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58624
llvm-svn: 356272
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).
As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.
By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.
An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.
Reviewers: pcc, mehdi_amini
Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D57470
llvm-svn: 356268
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().
If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.
Reviewers: qcolombet, gilr, kparzysz, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59211
llvm-svn: 356256
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.
llvm-svn: 356219
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.
We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.
Differential Revision: https://reviews.llvm.org/D59374
llvm-svn: 356192
Create members for Loop, ScalarEvolution, DominatorTree,
TargetTransformInfo and Formula.
Differential Revision: https://reviews.llvm.org/D58389
llvm-svn: 356131
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
llvm-svn: 355981
The included test case currently crashes on tip of tree. Rather than adding a bailout, I chose to restructure the code so that the existing helper function could be used. Given that, the majority of the diff is NFC-ish, but the key difference is that canConvertValue returns false when only one side is a non-integral pointer.
Thanks to Cherry Zhang for the test case.
Differential Revision: https://reviews.llvm.org/D59000
llvm-svn: 355962
This patch adds a new option to SplitAllCriticalEdges and uses it to avoid splitting critical edges when the destination basic block ends with unreachable. Otherwise if we split the critical edge, sanitizer coverage will instrument the new block that gets inserted for the split. But since this block itself shouldn't be reachable this is pointless. These basic blocks will stick around and generate assembly, but they don't end in sane control flow and might get placed at the end of the function. This makes it look like one function has code that flows into the next function.
This showed up while compiling the linux kernel with clang. The kernel has a tool called objtool that detected the code that appeared to flow from one function to the next. https://github.com/ClangBuiltLinux/linux/issues/351#issuecomment-461698884
Differential Revision: https://reviews.llvm.org/D57982
llvm-svn: 355947
The code might intend to replace puts("") with putchar('\n') even if the
return value is used. It failed because use_empty() was used to guard
the whole block. While returning '\n' (putchar('\n')) is technically
correct (puts is only required to return a nonnegative number on
success), doing this looks weird and there is really little benefit to
optimize puts whose return value is used. So don't do that.
llvm-svn: 355921
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
........
Reverted due to buildbot failures that I don't have time to track down.
llvm-svn: 355913
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
llvm-svn: 355906
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.
Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.
Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal
Reviewed By: sdesmalen
Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57728
llvm-svn: 355889
It hasn't seen active development in years, and it hasn't reached a
state where it was useful.
Remove the code until someone is interested in working on it again.
Differential Revision: https://reviews.llvm.org/D59133
llvm-svn: 355862
Summary:
Depends on https://reviews.llvm.org/D59069.
https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the
-coro-split pass would assert that a use was across a suspend point from
a definition. Normally this would mean that a value would "spill" across
a suspend point and thus need to be stored in the coroutine frame. However,
in this case the use was unreachable, and so it would not be necessary
to store the definition on the frame.
To prevent the assert, simply remove unreachable basic blocks from a
coroutine function before computing spills. This avoids the assert
reported in PR40979.
Reviewers: GorNishanov, tks2103
Reviewed By: GorNishanov
Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59068
llvm-svn: 355852
Summary:
Extract the functionality of eliminating unreachable basic blocks
within a function, previously encapsulated within the
-unreachableblockelim pass, and make it available as a function within
BlockUtils.h. No functional change intended other than making the logic
reusable.
Exposing this logic makes it easier to implement
https://reviews.llvm.org/D59068, which fixes coroutines bug
https://bugs.llvm.org/show_bug.cgi?id=40979.
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59069
llvm-svn: 355846
Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023
The SimplifyCFG pass will perform jump threading in some cases where
doing so is trivial and would simplify the CFG. When folding a series
of blocks with redundant conditional branches into an unconditional "critical
edge" block, it does not keep the debug location associated with the previous
conditional branch.
This patch fixes the bug described by copying the debug info from the
old conditional branch to the new unconditional branch instruction, and
adds a regression test for the SimplifyCFG pass that covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D59206
llvm-svn: 355833
Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966
The Jump Threading pass will replace certain conditional branch
instructions with unconditional branches when it can prove that only one
branch can occur. Prior to this patch, it would not carry the debug
info from the old instruction to the new one.
This patch fixes the bug described by copying the debug info from the
conditional branch instruction to the new unconditional branch
instruction, and adds a regression test for the Jump Threading pass that
covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D58963
llvm-svn: 355822
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.
This is sub-optimal because memcmp has to compute much more than
equality.
This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.
`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.
Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D56593
llvm-svn: 355672
In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.
This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.
Differential Revision: https://reviews.llvm.org/D58770
llvm-svn: 355597
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.
Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.
Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.
Reviewers: craig.topper, jyknight
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58929
llvm-svn: 355564
Summary:
They simply shuffle bits. MSan needs to do the same with shadow bits,
after making sure that the shuffle mask is fully initialized.
Reviewers: pcc, vitalybuka
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58858
llvm-svn: 355348
I'm not too familiar with this pass, so there might be a better
solution, but this appears to fix the degenerate:
PR40930
PR40931
PR40932
PR40934
...without affecting any real-world code.
As we've seen in several other passes, when we have unreachable blocks,
they can contain semi-bogus IR and/or cause unexpected conditions. We
would not typically expect these patterns to make it this far, but we
have to guard against them anyway.
llvm-svn: 355337
I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw.
Differential Revision: https://reviews.llvm.org/D58836
llvm-svn: 355222
This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load.
For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an *incredibly* rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we *may* fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually *is* well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit.
Differential Revision: https://reviews.llvm.org/D58809
llvm-svn: 355217
An idempotent atomicrmw is one that does not change memory in the process of execution. We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR.
Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load. As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future.
Differential Revision: https://reviews.llvm.org/D58251
llvm-svn: 355210
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.
llvm-svn: 355205
Summary:
ConstIntInfoVec contains elements extracted from the previous function.
In new PM, releaseMemory() is not called and the dangling elements can
cause segfault in findConstantInsertionPoint.
Rename releaseMemory() to cleanup() to deliver the idea that it is
mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix
this.
Reviewers: ormris, zzheng, dmgreen, wmi
Reviewed By: ormris, wmi
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58589
llvm-svn: 355174
Summary:
These sorts of blocks often contain calls to noreturn functions, like
longjmp, throw, or trap. If they don't end the program, they are
"interesting" from the perspective of sanitizer coverage, so we should
instrument them. This was discussed in https://reviews.llvm.org/D57982.
Reviewers: kcc, vitalybuka
Subscribers: llvm-commits, craig.topper, efriedma, morehouse, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58740
llvm-svn: 355152
The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name.
In this pass, we add three global variables:
(1) an order file buffer: a circular buffer at its own llvm section.
(2) a bitmap for each module: one byte for each function to say if the function is already executed.
(3) a global index to the order file buffer.
At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index.
Differential Revision: https://reviews.llvm.org/D57463
llvm-svn: 355133
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355131
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.
https://rise4fun.com/Alive/vaiW
Name: sext add nsw
%add = add nsw i8 %i, C0
%ext = sext i8 %add to i32
%r = add i32 %ext, C1
=>
%s = sext i8 %i to i32
%r = add i32 %s, sext(C0)+C1
Name: zext add nuw
%add = add nuw i8 %i, C0
%ext = zext i8 %add to i16
%r = add i16 %ext, C1
=>
%s = zext i8 %i to i16
%r = add i16 %s, zext(C0)+C1
llvm-svn: 355118
Summary:
It is mentioned in the document of DTU that "It is illegal to submit any update that has already been submitted, i.e., you are supposed not to insert an existent edge or delete a nonexistent edge." It is dangerous to violet this rule because DomTree and PostDomTree occasionally crash on this scenario.
This patch fixes `MergeBlockIntoPredecessor`, making it conformant to this precondition.
Reviewers: kuhar, brzycki, chandlerc
Reviewed By: brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58444
llvm-svn: 355105
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.
In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.
Reviewers: xur, vsk
Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58737
llvm-svn: 355044
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.
However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.
But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.
Resolves PR40749 and PR40754.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58652
llvm-svn: 355040
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.
rdar://48142697
llvm-svn: 354931
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 354930
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.
Differential Revision: https://reviews.llvm.org/D58616
llvm-svn: 354790
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
llvm-svn: 354748
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D58096
llvm-svn: 354670
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.
Logic changes:
Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.
Interface changes:
The second semantic of `applyUpdates` is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.
Reviewers: kuhar, brzycki, dmgreen, grosser
Reviewed By: brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58170
llvm-svn: 354669
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.
The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.
llvm-svn: 354656
Summary:
MemorySSA is not properly updated in LoopSimplifyCFG after recent changes. Use SplitBlock utility to resolve that and clear all updates once handleDeadExits is finished.
All updates that follow are removal of edges which are safe to handle via the removeEdge() API.
Also, deleting dead blocks is done correctly as is, i.e. delete from MemorySSA before updating the CFG and DT.
Reviewers: mkazantsev, rtereshin
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58524
llvm-svn: 354613
is false.
Right now for inliner and partial inliner, we always pass the address of a
valid ORE object to getInlineCost even if RemarkEnabled is false because of
no -Rpass is specified. Since ComputeFullInlineCost will be set to true if
ORE is non-null in getInlineCost, this introduces the problem that in
getInlineCost we cannot return early even if we already know the cost is
definitely higher than the threshold. It is a general problem for compile
time.
This patch fixes that by pass nullptr as the ORE argument if RemarkEnabled is
false.
Differential Revision: https://reviews.llvm.org/D58399
llvm-svn: 354542
Noticed these while doing a final sweep of the code to make sure I hadn't missed anything in my last couple of patches. The (minor) missed optimization was noticed because of the stylistic fix to avoid an overly specific cast.
llvm-svn: 354412
Same case as for memset and memcpy, but this time for clobbering stores and loads. We still can't allow coercion to or from non-integrals, regardless of the transform.
Now that I'm done the whole little sequence, it seems apparent that we'd entirely missed reasoning about clobbers in the original GVN support for non-integral pointers.
My appologies, I thought we'd upstreamed all of this, but it turns out we were still carrying a downstream hack which hid all of these issues. My chanks to Cherry Zhang for helping debug.
llvm-svn: 354407
Problem is very similiar to the one fixed for memsets in r354399, we try to coerce a value to non-integral type, and then crash while try to do so. Since we shouldn't be doing such coercions to start with, easy fix. From inspection, I see two other cases which look to be similiar and will follow up with most test cases and fixes if confirmed.
llvm-svn: 354403
GVN generally doesn't forward structs or array types, but it *will* forward vector types to non-vectors and vice versa. As demonstrated in tests, we need to inhibit the same set of transforms for vector of non-integral pointers as for non-integral pointers themselves.
llvm-svn: 354401
If we encountered a location where we tried to forward the value of a memset to a load of a non-integral pointer, we crashed. Such a forward is not legal in general, but we can forward null pointers. Test for both cases are included.
llvm-svn: 354399
This is no-functional-change-intended, but that was also
true when it was part of rL354276, and I managed to lose
2 predicates for the fold with constant...causing much bot
distress. So this time I'm adding a couple of negative tests
to avoid that.
llvm-svn: 354384
We are planning to be able to delete the current loop in LoopSimplifyCFG
in the future. Add API to notify the loop pass manager that it happened.
llvm-svn: 354314
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
(The matching here is incomplete. Trying to take minimal steps
to make sure we don't induce infinite looping from existing
canonicalizations of the 'select'.)
llvm-svn: 354221
Summary:
Unlimitted number of calls to getClobberingAccess can lead to high
compile times in pathological cases.
Limitting getClobberingAccess to a fairly high number. Can be adjusted
based on users/need.
Note: this is the only user of MemorySSA currently enabled by default.
The same handling exists in LICM (disabled atm). As MemorySSA gains more
users, this logic of capping will need to move inside MemorySSA.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D58248
llvm-svn: 354182
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.
Differential Revision: https://reviews.llvm.org/D58290
llvm-svn: 354170
If a lifetime.end marker occurs along one path through the extraction
region, but not another, then it's still incorrect to lift the marker,
because there is some path through the extracted function which would
ordinarily not reach the marker. If the call to the extracted function
is in a loop, unrolling can cause inputs to the function to become
optimized out as undef after the first iteration.
To prevent incorrect stack slot merging in the calling function, it
should be sufficient to lift lifetime.start markers for region inputs.
I've tested this theory out by doing a stage2 check-all with randomized
splitting enabled.
This is a follow-up to r353973, and there's additional context for this
change in https://reviews.llvm.org/D57834.
rdar://47896986
Differential Revision: https://reviews.llvm.org/D58253
llvm-svn: 354159
With or without PGO data applied, splitting early in the pipeline
(either before the inliner or shortly after it) regresses performance
across SPEC variants. The cause appears to be that splitting hides
context for subsequent optimizations.
Schedule splitting late again, in effect reversing r352080, which
scheduled the splitting pass early for code size benefits (documented in
https://reviews.llvm.org/D57082).
Differential Revision: https://reviews.llvm.org/D58258
llvm-svn: 354158
Summary:
The idea is that we now manipulate bases through a `unsigned BaseID` based on
order of appearance in the comparison chain rather than through the `Value*`.
Fixes 40714.
Reviewers: gchatelet
Subscribers: mgrang, jfb, jdoerfert, llvm-commits, hans
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58274
llvm-svn: 354131
Summary:
The changes to disable LTO unit splitting by default (r350949) and
detect inconsistently split LTO units (r350948) are causing some crashes
when the inconsistency is detected in multiple threads simultaneously.
Fix that by having the code always look for the inconsistently split
LTO units during the thin link, by checking for the presence of type
tests recorded in the summaries.
Modify test added in r350948 to remove single threading required to fix
a bot failure due to this issue (and some debugging options added in the
process of diagnosing it).
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57561
llvm-svn: 354062
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.
For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.
Differential Revision: https://reviews.llvm.org/D58244
llvm-svn: 354058
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.
Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context. Once we setle on a better name, I'll update all uses at once.
Differential Revision: https://reviews.llvm.org/D58242
llvm-svn: 354046
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.
Reviewers: davidxl, uabelho
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58215
llvm-svn: 354032
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.
Reviewers: sanjoy, mkazantsev
Reviewed By: mkazantsev
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58192
llvm-svn: 354031
This is the second attempt to port ASan to new PM after D52739. This takes the
initialization requried by ASan from the Module by moving it into a separate
class with it's own analysis that the new PM ASan can use.
Changes:
- Split AddressSanitizer into 2 passes: 1 for the instrumentation on the
function, and 1 for the pass itself which creates an instance of the first
during it's run. The same is done for AddressSanitizerModule.
- Add new PM AddressSanitizer and AddressSanitizerModule.
- Add legacy and new PM analyses for reading data needed to initialize ASan with.
- Removed DominatorTree dependency from ASan since it was unused.
- Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the
new PM analysis holds these 2 classes and will need to expose them.
Differential Revision: https://reviews.llvm.org/D56470
llvm-svn: 353985
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).
However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.
Differential Revision: https://reviews.llvm.org/D57834
llvm-svn: 353973
When instcombine sinks an instruction between two basic blocks, it sinks any
dbg.value users in the source block with it, to prevent debug use-before-free.
However we can do better by attempting to salvage the debug users, which would
avoid moving where the variable location changes. If we successfully salvage,
still sink a (cloned) dbg.value with the sunk instruction, as the sunk
instruction is more likely to be "live" later in the compilation process.
If we can't salvage dbg.value users of a sunk instruction, mark the dbg.values
in the original block as being undef. This terminates any earlier variable
location range, and represents the fact that we've optimized out the variable
location for a portion of the program.
Differential Revision: https://reviews.llvm.org/D56788
llvm-svn: 353936
This patch adds support of guards expressed in explicit form via
`widenable_condition` in Guard Widening pass.
Differential Revision: https://reviews.llvm.org/D56075
Reviewed By: reames
llvm-svn: 353932
Known underlying bugs have been fixed, intensive fuzz testing did not
find any new problems. Re-enabling by default. Feel free to revert if
it causes any functional failures.
llvm-svn: 353911
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].
llvm-svn: 353901
Summary:
Unlimitted number of calls to getClobberingAccess can lead to high
compile times in pathological cases.
Switching EnableLicmCap flag from bool to int, and enabling to default 100.
(tested to be appropriate for current bechmarks)
We can revisit this value when enabling MemorySSA.
Reviewers: sanjoy, chandlerc, george.burgess.iv
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57968
llvm-svn: 353897
Salvaging a redundant load instruction into a debug expression hides a
memory read from optimisation passes. Passes that alter memory behaviour
(such as LICM promoting memory to a register) aren't aware of these debug
memory reads and leave them unaltered, making the debug variable location
point somewhere unsafe.
Teaching passes to know about these debug memory reads would be challenging
and probably incomplete. Finding dbg.value instructions that need to be fixed
would likely be computationally expensive too, as more analysis would be
required. It's better to not generate debug-memory-reads instead, alas.
Changed tests:
* DeadStoreElim: test for salvaging of intermediate operations contributing
to the dead store, instead of salvaging of the redundant load,
* GVN: remove debuginfo behaviour checks completely, this behaviour is still
covered by other tests,
* InstCombine: don't test for salvaged loads, we're removing that behaviour.
Differential Revision: https://reviews.llvm.org/D57962
llvm-svn: 353824
Logic in `getInsertPointForUses` doesn't account for a corner case when `Def`
only comes to a Phi user from unreachable blocks. In this case, the incoming
value may be arbitrary (and not even available in the input block) and break
the loop-related invariants that are asserted below.
In fact, if we encounter this situation, no IR modification is needed. This
Phi will be simplified away with nearest cleanup.
Differential Revision: https://reviews.llvm.org/D58045
Reviewed By: spatel
llvm-svn: 353816
The function `LI.erase` has some invariants that need to be preserved when it
tries to remove a loop which is not the top-level loop. In particular, it
requires loop's preheader to be strictly in loop's parent. Our current logic
of deletion of dead blocks may erase the information about preheader before we
handle the loop, and therefore we may hit this assertion.
This patch changes the logic of loop deletion: we make them top-level loops
before we actually erase them. This allows us to trigger the simple branch of
`erase` logic which just detatches blocks from the loop and does not try to do
some complex stuff that need this invariant.
Thanks to @uabelho for reporting this!
Differential Revision: https://reviews.llvm.org/D57221
Reviewed By: fedor.sergeev
llvm-svn: 353813
Utility function that we use for blocks deletion always unconditionally removes
one-input Phis. In LoopSimplifyCFG, it can lead to breach of LCSSA form.
This patch alters this function to keep them if needed.
Differential Revision: https://reviews.llvm.org/D57231
Reviewed By: fedor.sergeev
llvm-svn: 353803
The code checked that the first root was an appropriate distance from
the base value, but skipped checking the other roots. This could lead to
rerolling a loop that can't be legally rerolled (at least, not without
rewriting the loop in a non-trivial way).
Differential Revision: https://reviews.llvm.org/D56812
llvm-svn: 353779
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.
In doing so we fix 3 potential issues:
* setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
metadata even though it will not be used anymore. This already caused
problems such as http://llvm.org/PR40546. Change the behavior to the
one of setAlreadyUnrolled which deletes this loop metadata.
* setAlreadyUnrolled() used to create a new LoopID by calling
MDNode::get with nullptr as the first operand, then replacing it by
the returned references using replaceOperandWith. It is possible
that MDNode::get would instead return an existing node (due to
de-duplication) that then gets modified. To avoid, use a fresh
TempMDNode that does not get uniqued with anything else before
replacing it with replaceOperandWith.
* LoopVectorizeHints::matchesHintMetadataName() only compares the
suffix of the attribute to set the new value for. That is, when
called with "enable", would erase attributes such as
"llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
"llvm.loop.distribute.enable" instead of the one to replace.
Fortunately, function was only called with "isvectorized".
Differential Revision: https://reviews.llvm.org/D57566
llvm-svn: 353738
This bug seems to be harmless in release builds, but will cause an error in UBSAN
builds or an assertion failure in debug builds.
When it gets to this opcode comparison, it assumes both of the operands are BinaryOperators,
but the prior m_LogicalShift will also match a ConstantExpr. The cast<BinaryOperator> will
assert in a debug build, or reading an invalid value for BinaryOp from memory with
((BinaryOperator*)constantExpr)->getOpcode() will cause an error in a UBSAN build.
The test I added will fail without this change in debug/UBSAN builds, but not in release.
Patch by: @AndrewScheidecker (Andrew Scheidecker)
Differential Revision: https://reviews.llvm.org/D58049
llvm-svn: 353736
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57967
llvm-svn: 353734
`CallBase` class rather than `CallSite` wrappers.
I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.
Differential Revision: https://reviews.llvm.org/D56122
llvm-svn: 353660
cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.
This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).
llvm-svn: 353603
For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met
- SROA kicks in and creates new alloca;
- there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
- the alloca is then used in A->B->A + PHI chain;
- there is a loop unrolling.
As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.
The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.
A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.
Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>
Reviewed By: Carrot
Differential Revision: https://reviews.llvm.org/D57053
llvm-svn: 353595
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.
Patch by Sergei Kachkov!
Fixes llvm.org/PR40455.
Differential Revision: https://reviews.llvm.org/D57919
llvm-svn: 353562
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.
Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.
Now, Windows uses static registration and is in line with all the other
platforms.
Reviewers: davidxl, wmi, inglorion, void, calixte
Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D57929
llvm-svn: 353547
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.
Reviewers: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57846
llvm-svn: 353537
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.
Differential Revision: https://reviews.llvm.org/D57444
llvm-svn: 353511
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.
Differential Revision: https://reviews.llvm.org/D57316
Reviewed By: kuhar
llvm-svn: 353502
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.
Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy
Reviewed By: hfinkel, vsk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57215
llvm-svn: 353500
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
anything that doesn't have a release semantic).
2. <op> does not modify the value being stored
Differential Revision: https://reviews.llvm.org/D57854
llvm-svn: 353471
This fixes a class of bugs introduced by D44367,
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y.
If the bitcast is between vector types with a different number of elements,
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.
This patch suppresses the transform if the bitcast changes the number of vector elements.
Patch by: @AndrewScheidecker (Andrew Scheidecker)
Differential Revision: https://reviews.llvm.org/D57871
llvm-svn: 353467
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.
Reviewers: vsk
Subscribers: sebpop, hiraditya, davidxl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57900
llvm-svn: 353434
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.
Differential Revision: https://reviews.llvm.org/D55373
llvm-svn: 353403
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353313
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.
Move it to just after PGO annotation (still before inlining), in both
pass managers.
Reviewers: vsk, hiraditya, sebpop
Subscribers: mehdi_amini, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57805
llvm-svn: 353270
DomTreeUpdater depends on headers from Analysis, but is in IR. This is a
layering violation since Analysis depends on IR. Relocate this code from IR
to Analysis to fix the layering violation.
llvm-svn: 353265
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.
rdar://47808235
llvm-svn: 353242
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.
llvm-svn: 353235
Some use cases are appearing where salvaging is needed that does not
correspond to an instruction being deleted -- for example an instruction
being sunk, or a Value not being available in a block being isel'd.
Enable more fine grained control over how salavging occurs by splitting
the logic into helper functions, separating things that are specific to
working on DbgVariableIntrinsics from those specific to interpreting IR
and building DIExpressions.
Differential Revision: https://reviews.llvm.org/D57696
llvm-svn: 353156
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.
SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.
This patch adds a bailout when we may end up with a zero SCEV after extension.
Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker
llvm-svn: 353136
Summary:
When attaching prof metadata to promoted direct calls in SamplePGO
mode, no need to construct and use a SmallVector to pass a single count
to the ArrayRef parameter, we can simply use a brace-enclosed init list.
This made a small but consistent improvement for a ThinLTO backend
compile I was measuring.
Reviewers: wmi
Subscribers: mehdi_amini, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57706
llvm-svn: 353123
Summary:
If the user declares or defines `__sancov_lowest_stack` with an
unexpected type, then `getOrInsertGlobal` inserts a bitcast and the
following cast fails:
```
Constant *SanCovLowestStackConstant =
M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
SanCovLowestStack = cast<GlobalVariable>(SanCovLowestStackConstant);
```
This variable is a SanitizerCoverage implementation detail and the user
should generally never have a need to access it, so we emit an error
now.
rdar://problem/44143130
Reviewers: morehouse
Differential Revision: https://reviews.llvm.org/D57633
llvm-svn: 353100
Summary:
The fix added in r352904 is not quite correct, or rather misleading:
1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
non-TFE/LWE, which is incorrect.
2. Regardless, this code path cannot even be hit for correct
TFE/LWE-enabled calls, because those return a struct. Added
a test case for those for completeness.
Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf
Reviewers: hliao, dstuttard, arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57681
llvm-svn: 353097
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.
Fixes http://llvm.org/PR40546
Differential Revision: https://reviews.llvm.org/D57542
llvm-svn: 353082
This assertion makes sure all sub-loops are in LCSSA form before
bringing their parent in LCSSA form. This precondition was added to
formLCSSA in D56848.
Reviewers: davide, efriedma, mzolotukhin
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D56921
llvm-svn: 352958
Summary:
Currently, ASan inserts a call to `__asan_handle_no_return` before every
`noreturn` function call/invoke. This is unnecessary for calls to other
runtime funtions. This patch changes ASan to skip instrumentation for
functions calls marked with `!nosanitize` metadata.
Reviewers: TODO
Differential Revision: https://reviews.llvm.org/D57489
llvm-svn: 352948
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
This cleans up all InvokeInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57171
llvm-svn: 352910
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
An unused variable problem was introduced with rL352870
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather
than just the assert.
llvm-svn: 352873
If we can reduce the x86-specific intrinsic to the generic op, it allows existing
simplifications and value tracking folds. AFAICT, this always results in identical
x86 codegen in the non-reduced case...which should be true because we semi-generically
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must
already combine/lower this intrinsic as expected.
This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.
Differential Revision: https://reviews.llvm.org/D57453
llvm-svn: 352870
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.
Reviewed By: xbolva00, anemet
Differential Revision: https://reviews.llvm.org/D57089
llvm-svn: 352849
Indices are checked as they are generated. No need to fill the whole array of indices.
Differential Revision: https://reviews.llvm.org/D57144
llvm-svn: 352839
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
This reverts commit f47d6b38c7 (r352791).
Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.
llvm-svn: 352800
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352791