The default assembly handling mode may introduce false positives in the
cases when MSan doesn't understand that the assembly call initializes
the memory pointed to by one of its arguments.
We introduce the conservative mode, which initializes the first
|sizeof(type)| bytes for every |type*| pointer passed into the
assembly statement.
llvm-svn: 329054
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.
llvm-svn: 329046
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
We use two approaches for determining the minimum bitwidth.
* Demanded bits
* Value tracking
If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.
But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in
%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i
are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in
%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0
it doesn't make sense to shrink %tmp15 and we can skip the value tracking.
Ideas are from Matthew Simpson!
Differential Revision: https://reviews.llvm.org/D44868
llvm-svn: 329035
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.
Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.
Test Plan: check-llvm
Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk
Reviewed By: rnk
Subscribers: llvm-commits, eric_niebler
Differential Revision: https://reviews.llvm.org/D44185
llvm-svn: 329033
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.
Reviewers: efriedma, majnemer, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45186
llvm-svn: 329027
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.
This patch clones the metadatas in the original function to the new function.
Differential Revision: https://reviews.llvm.org/D44127
llvm-svn: 328991
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.
Reviewers: EricWF, modocache, rnk, lewissbaker
Reviewed By: modocache
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45114
llvm-svn: 328986
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 328980
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no limit).
Reviewers: wmi
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45127
llvm-svn: 328934
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.
Differential Revision: https://reviews.llvm.org/D44199
llvm-svn: 328933
This patch resolves link errors when the address of a static function is taken, and that function is uninstrumented by DFSan.
This change resolves bug 36314.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D44784
llvm-svn: 328890
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 328854
In r312664 (D36404), JumpThreading stopped threading edges into
loop headers. Unfortunately, I observed a significant performance
regression as a result of this change. Upon further investigation,
the problematic pattern looked something like this (after
many high level optimizations):
while (true) {
bool cond = ...;
if (!cond) {
<body>
}
if (cond)
break;
}
Now, naturally we want jump threading to essentially eliminate the
second if check and hook up the edges appropriately. However, the
above mentioned change, prevented it from doing this because it would
have to thread an edge into the loop header.
Upon further investigation, what is happening is that since both branches
are threadable, JumpThreading picks one of them at arbitrarily. In my
case, because of the way that the IR ended up, it tended to pick
the one to the loop header, bailing out immediately after. However,
if it had picked the one to the exit block, everything would have
worked out fine (because the only remaining branch would then be folded,
not thraded which is acceptable).
Thus, to fix this problem, we can simply eliminate loop headers from
consideration as possible threading targets earlier, to make sure that
if there are multiple eligible branches, we can still thread one of
the ones that don't target a loop header.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42260
llvm-svn: 328798
The existing LoopRotation.cpp is implemented as one of loop passes instead of
being a utility. The user cannot easily perform the loop rotation selectively
(or on demand) under different optimization level. For example, the loop
rotation is needed as part of the logic to convert a loop into a loop with
bottom test for a transformation. If the loop rotation is simply added as a
loop pass before the transformation, the pass is skipped if it is compiled at
–O0 or if it is explicitly disabled by the user, causing the compiler to
generate incorrect code. Furthermore, as a loop pass it will rotate all loops
instead of just the relevant loops.
We provide a utility interface for the loop rotation so that the loop rotation
can be called on demand. The changeset is as follows:
- Create a new file lib/Transforms/Utils/LoopRotationUtils.cpp and move the main
implementation of class LoopRotate into this file.
- Create a new file llvm/include/Transform/Utils/LoopRotationUtils.h with the
interface LoopRotation(...).
- Original LoopRotation.cpp is changed to use the utility function LoopRotation
in LoopRotationUtils.cpp. This is done in the same way community did for
mem-to-reg implementation.
Patch by Jin Lin!
Differential Revision: https://reviews.llvm.org/D44595
llvm-svn: 328766
Otherwise the definitions can't see the extern C declarations and get
name mangled, making it impossible for users to call them. This breaks
the Go bindings.
llvm-svn: 328765
This is a step towards the upcoming KMSAN implementation patch.
KMSAN is going to prepend a special basic block containing
tool-specific calls to each function. Because we still want to
instrument the original entry block, we'll need to store it in
ActualFnStart.
For MSan this will still be F.getEntryBlock(), whereas for KMSAN
it'll contain the second BB.
llvm-svn: 328697
This is a step towards the upcoming KMSAN implementation patch.
The isStore argument is to be used by getShadowOriginPtrKernel(),
it is ignored by getShadowOriginPtrUserspace().
Depending on whether a memory access is a load or a store, KMSAN
instruments it with different functions, __msan_metadata_ptr_for_load_X()
and __msan_metadata_ptr_for_store_X().
Those functions may return different values for a single address,
which is necessary in the case the runtime library decides to ignore
particular accesses.
llvm-svn: 328692
Fixed counter/weight overflow that leads to an assertion. Also fixed the help
string for pgo-emit-branch-prob option.
Differential Revision: https://reviews.llvm.org/D44809
llvm-svn: 328653
We check `canPeel` twice: when evaluating the number of iterations to be peeled
and within the method `peelLoop` that performs peeling. This method is only
executed if the calculated peel count is positive. Thus, the check in `peelLoop` can
never fail. This patch replaces this check with an assert.
Differential Revision: https://reviews.llvm.org/D44919
Reviewed By: fhahn
llvm-svn: 328615
As a follow-up to r328480, this updates the logic for the decreasing
safety checks in a similar manner:
- CanBeMax is replaced by CannotBeMaxInLoop which queries
isLoopEntryGuardedByCond on the maximum value.
- SumCanReachMin is replaced by isSafeDecreasingBound which includes
some logic from parseLoopStructure and, again, has been updated to
use isLoopEntryGuardedByCond on the given bounds.
Differential Revision: https://reviews.llvm.org/D44776
llvm-svn: 328613
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext
llvm-svn: 328539
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.
We still need to loosen the FMF check.
llvm-svn: 328502
Implement TTI interface for targets to indicate that the LSR should give
priority to post-incrementing addressing modes.
Combination of patches by Sebastian Pop and Brendon Cahoon.
Differential Revision: https://reviews.llvm.org/D44758
llvm-svn: 328490
Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on
fact that exit count of outer loops cannot rely on exiting blocks of
inner loops, which is true in current implementation of backedge taken count
calculation but is wrong in general. As result, when we only forget the loop that
we have just unrolled, we may still have cached data for its outer loops (in particular,
exit counts) which keeps references on blocks of inner loop that could have been
changed or even deleted.
The attached test demonstrates a situaton when after unrolling of innermost loop
the outermost loop contains a dangling pointer on non-existant block. The problem
shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV
smarter about exit count calculation. I am not sure if the bug exists without this patch,
it appears that now it is accidentally correct just because in practice exact backedge
taken count for outer loops with complex control flow inside is never calculated.
But when SCEV learns to do so, this problem shows up.
This patch replaces existing logic of SCEV loop invalidation with a correct one, which
happens to be invalidation of outermost loop (which also leads to invalidation of all
loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers
on removed blocks, or just outdated information that has changed after unrolling.
Differential Revision: https://reviews.llvm.org/D44818
Reviewed By: samparker
llvm-svn: 328483
CanBeMin is currently used which will report true for any unknown
values, but often a check is performed outside the loop which covers
this situation:
for (int i = 0; i < N; ++i)
...
if (N > 0)
for (int i = 0; i < N; ++i)
...
So I've add 'LoopGuardedAgainstMin' which reports whether N is
greater than the minimum value which then allows loop with a variable
loop count to be optimised. I've also moved the increasing bound
checking into its own function and replaced SumCanReachMax is another
isLoopEntryGuardedByCond function.
llvm-svn: 328480
Summary:
This was motivated by absence of PrunEH functionality in new PM.
It was decided that a proper way to do PruneEH is to add NoUnwind inference
into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top.
This change generalizes attribute handling implemented for (a removal of)
Convergent attribute, by introducing a generic builder-like class
AttributeInferer
It registers all the attribute inference requests, storing per-attribute
predicates into a vector, and then goes through an SCC Node, scanning all
the instructions for not breaking attribute assumptions.
The main idea is that as soon all the instructions from all the functions
of SCC Node conform to attribute assumptions then we are free to infer
the attribute as set for all the functions of SCC Node.
It handles two distinct cases of attributes:
- those that might break due to derefinement of the function code
for these attributes we are allowed to apply inference only if all the
functions are "exact definitions". Example - NoUnwind.
- those that do not care about derefinement
for these attributes we are allowed to apply inference as soon as we see
any function definition. Example - removal of Convergent attribute.
Also in this commit:
* Converted all the FunctionAttrs tests to use FileCheck and added new-PM
invocations to them
* FunctionAttrs/convergent.ll test demonstrates a difference in behavior between
new and old PM implementations. Marked with FIXME.
* PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg
combo as intended
* some of "other" tests were updated since function-attrs now infers 'nounwind'
even for old PM pipeline
* -disable-nounwind-inference hidden option added as a possible workaround for a supposedly
rare case when nounwind being inferred by default presents a problem
Reviewers: chandlerc, jlebar
Reviewed By: jlebar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44415
llvm-svn: 328377
Summary:
Porting HWASan to Linux x86-64, first of the three patches, LLVM part.
The approach is similar to ARM case, trap signal is used to communicate
memory tag check failure. int3 instruction is used to generate a signal,
access parameters are stored in nop [eax + offset] instruction immediately
following the int3 one.
One notable difference is that x86-64 has to untag the pointer before use
due to the lack of feature comparable to ARM's TBI (Top Byte Ignore).
Reviewers: eugenis
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D44699
llvm-svn: 328342
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.
Differential Revision: https://reviews.llvm.org/D44742
llvm-svn: 328316
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 328307
Loop peeling also has an impact on the induction variables, so we should
benefit from induction variable simplification after peeling too.
Reviewers: sanjoy, bogner, mzolotukhin, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43878
llvm-svn: 328301
Transforms/Scalar/SCCP.cpp implemented both the Scalar and IPO SCCP, but
this meant Transforms/Scalar including Transfroms/IPO headers, creating
a circular dependency. (IPO depends on Scalar already) - so move the IPO
SCCP shims out into IPO and the basic library implementation accessible
from Scalar/SCCP.h to be used from the IPO/SCCP.cpp implementation.
llvm-svn: 328250
Summary:
When building with libFuzzer, converting control flow to selects or
obscuring the original operands of CMPs reduces the effectiveness of
libFuzzer's heuristics.
This patch provides an attribute to disable or modify certain optimizations
for optimal fuzzing signal.
Provides a less aggressive alternative to https://reviews.llvm.org/D44057.
Reviewers: vitalybuka, davide, arsenm, hfinkel
Reviewed By: vitalybuka
Subscribers: junbuml, mehdi_amini, wdng, javed.absar, hiraditya, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44232
llvm-svn: 328214
Summary:
LoopPredication is not profitable when the loop is known to always exit
through some block other than the latch block.
A coarse grained latch check can cause loop predication to predicate the
loop, and unconditionally deoptimize.
However, without predicating the loop, the guard may never fail within the
loop during the dynamic execution because the non-latch loop termination
condition exits the loop before the latch condition causes the loop to
exit.
We teach LP about this using BranchProfileInfo pass.
Reviewers: apilipenko, skatkov, mkazantsev, reames
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44667
llvm-svn: 328210
The dominator tree analysis can be preserved easily.
Some other kinds of analysis can probably be preserved
too.
Reviewers: junbuml, dberlin
Reviewed By: dberlin
Differential Revision: https://reviews.llvm.org/D43173
llvm-svn: 328206
DuplicateInstructionsInSplitBetween can preserve the DT by passing
through DT to SplitEdge.
Reviewers: sanjoy, junbuml, anna, kuhar
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D44629
llvm-svn: 328189
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
MemCpyOpt pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.
We also add a few tests to fill gaps in the testing of this pass.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 328097
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Backed out for causing performance regressions. Re-landing
because we've determined that these regressions were noise.
Original Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 328096
The inline assembly generated for the ARC autorelease elision marker
must have a funclet token if it's emitted inside a funclet, otherwise
the inline assembly (and all subsequent code in the funclet) will be
marked unreachable by WinEHPrepare.
Note that this only applies for the non-O0 case, since at O0, clang
emits the autorelease elision marker itself rather than deferring to the
backend. The fix for clang is handled in a separate change.
Differential Revision: https://reviews.llvm.org/D44641
llvm-svn: 328042
Summary: Fix a bug in entry block shuffled to middle of the chain.
Reviewers: davide, courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44642
llvm-svn: 327971
Summary:
It turned out to be error-prone to expect the callers to handle that - better to
leave the decision to this routine and make the required data to be explicitly
passed to the function.
This handles the case that was missed in the r322473 and fixes the assert
mentioned in PR36524.
Reviewers: dorit, mssimpso, Ayal, dcaballe
Reviewed By: dcaballe
Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits
Differential Revision: https://reviews.llvm.org/D43812
llvm-svn: 327960
This is complicated by -0.0 and nan. This is based on the DAG patterns
as shown in D44091. I'm hoping that we can just remove those DAG folds
and always rely on IR canonicalization to handle the matching to fabs.
We would still need to delete the broken code from DAGCombiner to fix
PR36600:
https://bugs.llvm.org/show_bug.cgi?id=36600
Differential Revision: https://reviews.llvm.org/D44550
llvm-svn: 327858
Despite their names, RegSaveAreaPtrPtr and OverflowArgAreaPtrPtr
used to be i8* instead of i8**.
This is important, because these pointers are dereferenced twice
(first in CreateLoad(), then in getShadowOriginPtr()), but for some
reason MSan allowed this - most certainly because it was possible
to optimize getShadowOriginPtr() away at compile time.
Differential revision: https://reviews.llvm.org/D44520
llvm-svn: 327830
For MSan instrumentation with MS.ParamTLS and MS.ParamOriginTLS being
TLS variables, the CreateAdd() with ArgOffset==0 is a no-op, because
the compiler is able to fold the addition of 0.
But for KMSAN, which receives ParamTLS and ParamOriginTLS from a call
to the runtime library, this introduces a stray instruction which
complicates reading/testing the IR.
Differential revision: https://reviews.llvm.org/D44514
llvm-svn: 327829
This is a step towards the upcoming KMSAN implementation patch.
KMSAN is going to use a different warning function,
__msan_warning_32(uptr origin), so we'd better create the warning
calls in one place.
Differential Revision: https://reviews.llvm.org/D44513
llvm-svn: 327828
LICM deletes trivially dead instructions which it won't attempt to sink.
Attempt to salvage debug values which reference these instructions.
llvm-svn: 327800
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.
This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.
Differential Revision: https://reviews.llvm.org/D41879
llvm-svn: 327767
This builds on the work from https://reviews.llvm.org/D44287. It turned out supporting fcmp was much easier than I realized, so let's do that now.
As an aside, our -O3 handling of a floating point IVs leaves a lot to be desired. We do convert the float IV to an integer IV, but do so late enough that many other optimizations are missed (e.g. we don't vectorize).
Differential Revision: https://reviews.llvm.org/D44542
llvm-svn: 327722
JumpThreading iterates over F until the IR quiesces. Transforming
unreachable BBs increases compile time and it is also possible to
never stabilize causing JumpThreading to hang. An older attempt at
fixing this problem was D3991 where removeUnreachableBlocks(F)
was called before JumpThreading began. This has a few drawbacks:
* expensive - the routine attempts to fix up the IR to identify
additional BBs that can be removed along with unreachable BBs.
* aggressive - does not identify and preserve the shape of the IR.
At a minimum it does not preserve loop hierarchies.
* invasive - altering reachable blocks it may disrupt IR shapes
that could have otherwise been JumpThreaded.
This patch avoids removeUnreachableBlocks(F) and instead tracks
unreachable BBs in a SmallPtrSet using DominatorTree to validate the
initial state of all BBs. We then rely on subsequent passes to identify
and remove these unreachable blocks from F.
Reviewers: dberlin, sebpop, kuhar, dinesh.d
Reviewed by: sebpop, kuhar
Subscribers: hiraditya, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D44177
llvm-svn: 327713
If the loop body contains conditions of the form IndVar < #constant, we
can remove the checks by peeling off #constant iterations.
This improves codegen for PR34364.
Reviewers: mkuper, mkazantsev, efriedma
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43876
llvm-svn: 327671
It is common to have conditional exits within a loop which are known not to be taken on some iterations, but not necessarily all. This patches extends our reasoning around guaranteed to execute (used when establishing whether it's safe to dereference a location from the preheader) to handle the case where an exit is known not to be taken on the first iteration and the instruction of interest *is* known to be taken on the first iteration.
This case comes up in two major ways:
* If we have a range check which we've been unable to eliminate, we frequently know that it doesn't fail on the first iteration.
* Pass ordering. We may have a check which will be eliminated through some sequence of other passes, but depending on the exact pass sequence we might never actually do so or we might miss other optimizations from passes run before the check is finally eliminated.
The initial version (here) is implemented via InstSimplify. At the moment, it catches a few cases, but misses a lot too. I added test cases for missing cases in InstSimplify which I'll follow up on separately. Longer term, we should probably wire SCEV through to here to get much smarter loop aware simplification of the first iteration predicate.
Differential Revision: https://reviews.llvm.org/D44287
llvm-svn: 327664
If we've already established an invariant scope with an earlier generation, we don't want to hide it in the scoped hash table with one with a later generation. I noticed this when working on the invariant-load handling, but it also applies to the invariant.start case as well.
Without this change, my previous patch for invariant-load regresses some cases, so I'm pushing this without waiting for review. This is why you don't make last minute tweaks to patches to catch "obvious cases" after it's already been reviewed. Bad Philip!
llvm-svn: 327655
This is a follow up to https://reviews.llvm.org/D43716 which rewrites the invariant load handling using the new infrastructure. It's slightly more powerful, but only in somewhat minor ways for the moment. It's not clear that DSE of stores to invariant locations is actually interesting since why would your IR have such a construct to start with?
Note: The submitted version is slightly different than the reviewed one. I realized the scope could start for an invariant load which was proven redundant and removed. Added a test case to illustrate that as well.
Differential Revision: https://reviews.llvm.org/D44497
llvm-svn: 327646
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", the HoistThenElseCodeToIf routine will attempt to merge
the debug location associated with the two original copies of the hoisted
instruction.
This is a problem in the special case where the hoisted instruction is a
debug info intrinsic, since for those the debug location is considered
part of the intrinsic and attempting to modify it may resut in invalid
IR. This is the underlying cause of PR36410.
This patch fixes the problem by handling debug info intrinsics specially:
instead of hoisting one copy and merging the two locations, the code now
simply hoists both copies, each with its original location intact. Note
that this is still only done in the case where both original copies are
otherwise (i.e. apart from location metadata) identical.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D44312
llvm-svn: 327622
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
so the code to add new loops was factored out
* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
sequence (e.g. LoopSimplify) might invalidate BPI results.
Complete solution for BPI will likely take some time to discuss and figure out,
so for now this was partially solved by making BPI optional in IRCE
(skipping a couple of profitability checks if it is absent).
Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.
Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795
llvm-svn: 327619
Summary:
Before this patch call graph is like this in the LoopUnrollPass:
tryToUnrollLoop
ApproximateLoopSize
collectEphemeralValues
/* Use collected ephemeral values */
computeUnrollCount
analyzeLoopUnrollCost
/* Bail out from the analysis if loop contains CallInst */
This patch moves collection of the ephemeral values to the tryToUnrollLoop
function and passes the collected values into both ApproximateLoopsize (as
before) and additionally starts using them in analyzeLoopUnrollCost:
tryToUnrollLoop
collectEphemeralValues
ApproximateLoopSize(EphValues)
/* Use EphValues */
computeUnrollCount(EphValues)
analyzeLoopUnrollCost(EphValues)
/* Ignore ephemeral values - they don't contribute to the final cost */
/* Bail out from the analysis if loop contains CallInst */
Reviewers: mzolotukhin, evstupac, sanjoy
Reviewed By: evstupac
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43931
llvm-svn: 327617
Summary:
This variable is largely going unused; aside from reporting number of instructions for in DEBUG builds.
The only use of NumInstructions is in debug output to represent the LoopSize. That value can be can be misleading as it also includes metadata instructions (e.g., DBG_VALUE) which have no real impact. If we do choose to keep this around, we probably should guard it by a DEBUG macro, as it's not used in production builds.
Reviewers: majnemer, congh, rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D44495
llvm-svn: 327589
If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed.
This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.)
This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time. I took a quick look at doing so, but probably need some input from a MemorySSA expert.
Differential Revision: https://reviews.llvm.org/D43716
llvm-svn: 327577
This is PR36686.
If a user of a library is LTOed with that library we take the
opportunity to set dso_local, but we don't clear dllimport, which
creates an invalid IR.
llvm-svn: 327408
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for:
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by
SimplifyAssociativeOrCommutative().
llvm-svn: 327404
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: chandlerc, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42974
llvm-svn: 327398
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327360
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction. The new site will have a different token and needs to be
reassociated with the new instruction.
Unfortunately, there is no way to alter the bundle after the
construction of the instruction. Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.
llvm-svn: 327336
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.
It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.
Differential Revision: https://reviews.llvm.org/D44053
llvm-svn: 327329
Summary:
ProvenanceAnalysis::related(A, B) currently memoizes its results, and on big
tests the cache grows too large, and we're spending most of the time
growing/looking through DenseMap.
This patch reduces the size of the cache by normalizing keys first: we do that
by calling GetUnderlyingObjCPtr on the input values. The results of
GetUnderlyingObjCPtr are also memoized in a separate cache.
The patch doesn't bring noticable changes to compile time on CTMark, however
significantly helps one of our internal tests.
Reviewers: gottesmm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44270
llvm-svn: 327328
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
getNumUses is a linear operation. It walks a linked list to get a count. So in this case its better to just ask if there are any users rather than how many.
llvm-svn: 327314
This reverts r326908, originally landed as D44102.
Reverted for causing performance regressions on x86. (These regressions
are not yet understood.)
llvm-svn: 327252
Use isInlineViable to prevent inlining of functions with non-inlinable
constructs, in case cost analysis is skipped.
Reviewers: efriedma, sfertile, davide, davidxl
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42846
llvm-svn: 327207
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", there is no need to require that debug intrinsics match
before moving them (and merging them). Instead, we can simply always keep
all debug intrinsics from both sides of the "if".
This fixes PR36410, which describes a problem where as a result of the attempt
to merge debug locations for two debug intrinsics we end up with an invalid
intrinsic, where the scope indicated in the !dbg location no longer matches
the scope of the variable tracked by the intrinsic.
In addition, this has the benefit that we no longer throw away information
that is actually still valid, helping to generate better debug data.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D44312
llvm-svn: 327175
There are six separate instances of getPointerOperand() utility.
LoopVectorize.cpp has one of them,
and I don't want to create a 7th one while I'm trying to move
LoopVectorizationLegality into a separate file
(eventual objective is to move it to Analysis tree).
See http://lists.llvm.org/pipermail/llvm-dev/2018-February/120999.html
for llvm-dev discussions
Closes D43323.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327173
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.
This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.
The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.
For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html
Differential Revision: https://reviews.llvm.org/D42453
llvm-svn: 327163
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading. In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262
llvm-svn: 327150
Fixes PR36311.
See more detailed analysis in
https://bugs.llvm.org/show_bug.cgi?id=36311.
isUniform() information is recomputed after LV started transforming the
underlying IR and that triggered an assert in SCEV.
From vectorizer's architectural perspective, such information, while
still useful in vector code gen, should not be recomputed after the
start of transforming the LLVM IR. Instead, we should collect and cache
such information during the analysis phase of LV and use the cached info
during code gen.
From the symptom perspective, this assert as it stands right now is not
very useful. Legality already rejected loops that would trigger the
assert. As such, commenting out the assert is NFC from vectorizer's
functionality perspective. On top of that, just above the assertion, we
check for unit-strided load/store or
gather scatter. Addresses can't be uniform below that check.
From vectorization theory point of view, we don't have to reject all
cases of stores to uniform addresses. Eventually, we should support
safe/profitable cases.
This patch resolves the issue by removing the useless assertion that is
invoking LAA's isUniform() that requires up-to-date DomTree ---- once
vector code gen starts modifying CFG, we don't have an up-to-date
DomTree.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327109
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.
rdar://problem/34496278
llvm-svn: 327092
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327061
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327053
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
I've also enabled EliminateAvailableExternally for all optimization
levels, I believe it being disabled for O1 was an oversight.
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327041
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327029
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.
Author: FarhanaAleen
Reviewed By: rampitec
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44179
llvm-svn: 326910
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Backed out for failing an assert in clang bootstrap builds. Re-landing
with a fix for handling non-power-of-two inputs (e.g. udiv i24).
Original Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326908
Breaks bootstrap builds: clang built with this patch asserts while
building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid
cast!"' failed.
llvm-svn: 326900
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Reviewers: spatel, sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326898
The LoadStoreVectorizer thought that <1 x T> and T were the same types
when merging stores, leading to a crash later.
Patch by Erik Hogeman.
Differential Revision: https://reviews.llvm.org/D44014
llvm-svn: 326884
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:
from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];
DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:
maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
src = {0,+,1}<nuw><nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
subscript 1
src = {-6,+,3}<nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
Separable = {}
Coupled = {1}
With the current patch, DA will correctly work on only one dimension:
maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
dst = {%A,+,404}<%for.body>
class = 1
loops = {1}
Separable = {0}
Coupled = {}
This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.
The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.
I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.
The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.
Patch written by Sebastian Pop and Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D35430
llvm-svn: 326837
Most of the folds based on SelectPatternResult belong in InstSimplify rather than
InstCombine, so the helper code should be available to other passes/analysis.
llvm-svn: 326812
Change doCallSiteSplitting to iterate until we reach the terminator instruction.
tryToSplitCallSite can replace BB's terminator in case BB is a successor of
itself. Then IE will be invalidated and we also have to check the current
terminator.
Reviewers: junbuml, davidxl, davide, fhahn
Reviewed By: fhahn, junbuml
Differential Revision: https://reviews.llvm.org/D43824
llvm-svn: 326793
In case PredBB == BB and StopAt == BB's terminator, StopAt != &*BI will
fail, because BB's terminator instruction gets replaced.
By using BB.getTerminator() we get the current terminator which we can use
to compare.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D43822
llvm-svn: 326779
Summary:
RewriteStatepointsForGC collects parse points for further processing.
During the collection if a callsite is found in an unreachable block
(DominatorTree::isReachableFromEntry()) then all unreachable blocks are
removed by removeUnreachableBlocks(). Some of the removed blocks could
have been reachable according to DominatorTree::isReachableFromEntry().
In this case the collected parse points became stale and resulted in a
crash when accessed.
The fix is to unconditionally canonicalize the IR to
removeUnreachableBlocks and then collect the parse points.
The added test crashes with the old version and passes with this patch.
Patch by Yevgeny Rouban!
Reviewed by: Anna
Differential Revision: https://reviews.llvm.org/D43929
llvm-svn: 326748
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).
This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.
Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44063
llvm-svn: 326730
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.
Differential Revision: https://reviews.llvm.org/D43774
llvm-svn: 326729
getCompare returns true, false or undef constants if the comparison can
be evaluated, or nullptr if it cannot. This is in line with what
ConstantExpr::getCompare returns. It also allows us to use
ConstantExpr::getCompare for comparing constants.
Reviewers: davide, mssimpso, dberlin, anna
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D43761
llvm-svn: 326720
Summary:
We can discard initial blocks that do other work
We do not need to limit ourselves to just the first block in the chain.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44029
llvm-svn: 326698
Iterating through predecessors of `TailBB` while removing their
terminators leads to use after-free, because the predecessor list is
changing on each removal.
llvm-svn: 326668
Summary:
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.
Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.
Reviewers: junbuml, mcrosier, davidxl, davide, fhahn
Reviewed By: fhahn
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D43729
llvm-svn: 326666
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.
Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.
llvm-svn: 326638
In stage2 -O3 builds of llc, this results in small but measurable
increases in the number of variables with locations, and in the number
of unique source variables overall.
(According to llvm-dwarfdump --statistics, there are 123 additional
variables with locations, which is just a 0.006% improvement).
The size of the .debug_loc section of the llc dsym increases by 0.004%.
llvm-svn: 326629
In stage2 -O3 builds of llc, this results in a 0.3% increase in the
number of variables with locations, and a 0.2% increase in the number of
unique source variables overall.
The size of the .debug_loc section of the llc dsym increases by 0.5%.
llvm-svn: 326621
Instead of returning the smaller FP constant we now return the minimal Type the constant can fit into. We also return the Type of the input to any fp extends. The legality checks are then done on just the size of these Types. If we find something profitable we then emit FPTruncs in front of the smaller binop and assume those FPTruncs will be constant folded or combined with any ConstantFPs or fpextends.
Differential Revision: https://reviews.llvm.org/D44038
llvm-svn: 326617
The code was checking that all of the instructions in the
sequence are 'fast', but that's not necessary. The final
multiply is all that we need to check (tests adjusted).
The fmul doesn't need to be fully 'fast' either, but that
can be another patch.
llvm-svn: 326608
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.
Differential Revision: https://reviews.llvm.org/D43826
llvm-svn: 326585
This patch adds support for detecting outer loops with irreducible control
flow in LV. Current detection uses SCCs and only works for innermost loops.
This patch adds a utility function that works on any CFG, given its RPO
traversal and its LoopInfoBase. This function is a generalization
of isIrreducibleCFG from lib/CodeGen/ShrinkWrap.cpp. The code in
lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility
function.
Patch by Diego Caballero <diego.caballero@intel.com>
Differential Revision: https://reviews.llvm.org/D40874
llvm-svn: 326568
This is a retry of r326502 with updates to the reassociate
test file that I missed the first time.
@test15_reassoc in the supposed -reassociate test file
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.
Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern.
Reassociate doesn't get that case because it hasn't been
updated to use less than 'fast' FMF.
llvm-svn: 326513
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.
llvm-svn: 326510
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.
Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.
Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.
Patch by Fedor Indutny
Differential Revision: https://reviews.llvm.org/D43695
llvm-svn: 326404
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.
This fixes PR36546
Patch by: Fedor Indutny (indutny)
Differential revision: https://reviews.llvm.org/D43859
llvm-svn: 326376
Currently this code's control flow very much assumes that there are no meaningful checks after determining that it's a ConstantFP. So whenever it wants to stop it just does "return V". But V is also the variable name it uses when it wants to return a new value. So 'return V' appears multiple times with different meanings.
This patch just moves all the code into a helper function and returns nullptr when it wants to stop.
I've split this from D43774 while I try to figure out how to best handle the vector case there. But this change by itself at least seemed like a readability improvement.
Differential Revision: https://reviews.llvm.org/D43833
llvm-svn: 326361
The API verification tool tapi has difficulty processing frameworks
which enable code coverage, but which have no code. The profile lowering
pass does not emit the runtime hook in this case because no counters are
lowered.
While the hook is not needed for program correctness (the profile
runtime doesn't have to be linked in), it's needed to allow tapi to
validate the exported symbol set of instrumented binaries.
It was not possible to add a workaround in tapi for empty binaries due
to an architectural issue: tapi generates its expected symbol set before
it inspects a binary. Changing that model has a higher cost than simply
forcing llvm to always emit the runtime hook.
rdar://36076904
Differential Revision: https://reviews.llvm.org/D43794
llvm-svn: 326350
Also, rename 'foldOpWithConstantIntoOperand' because that's annoyingly
vague. The constant check is redundant in some cases, but it allows
removing duplication for most of the calls.
llvm-svn: 326329
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.
We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43825
llvm-svn: 326318
Removes verifyDomTree, using assert(verify()) everywhere instead, and
changes verify a little to always run IsSameAsFreshTree first in order
to print good output when we find errors. Also adds verifyAnalysis for
PostDomTrees, which will allow checking of PostDomTrees it the same way
we check DomTrees and MachineDomTrees.
Differential Revision: https://reviews.llvm.org/D41298
llvm-svn: 326315
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.
This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.
This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.
Reviewers: davide, dberlin
Reviewed By: dberlin
Differential Revision: https://reviews.llvm.org/D43770
llvm-svn: 326181
Note: gcc appears to allow this fold with -freciprocal-math alone,
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.
This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.
Differential Revision: https://reviews.llvm.org/D43398
llvm-svn: 326098
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.
This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.
Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.
Closes D43208.
Patch by: Hideki Saito <hideki.saito@intel.com>
llvm-svn: 326079
The dependency matrix is only empty if no conflicting load/store
instructions have been found. In that case, it is safe to interchange.
For the LLVM test-suite, after this change around 1900 loops are
interchanged, whereas it is 15 before this change. On cortex-a57,
this gives an improvement of -0.57% on the geomean execution
time of SPEC2006, SPEC2000 and the test-suite. There are a
few small perf regressions, but I think we can improve on those
by making the cost model better.
Reviewers: karthikthecool, mcrosier
Reviewed by: karthikthecool
Differential Revision: https://reviews.llvm.org/D43236
llvm-svn: 326077
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.
Specifically this case:
```
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
```
In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.
This patch also moves the dbg.value insertion routine from LoopRotation.cpp
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).
Reviewers: mzolotukhin, aprantl, vsk, davide
Reviewed By: aprantl, vsk
Subscribers: dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 325926
The existing code was inefficiently looking for 'nsz' variants.
That's unnecessary because we canonicalize those to the expected
form with -0.0.
We may also want to adjust or remove the fold that sinks negation.
We don't do that for fdiv (or integer ops?). That should be uniform?
It may also lead to missed optimization as in PR21914:
https://bugs.llvm.org/show_bug.cgi?id=21914
...or we just have to fix other passes to avoid that problem.
llvm-svn: 325924
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Reviewers: arsenm, rampitec, jlebar
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D40546
llvm-svn: 325881
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.
Reviewers: davide, Gerolf
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43177
llvm-svn: 325880
When DataFlowSanitizer transforms a call to a custom function, the
new call has extra parameters. The attributes on parameters must be
updated to take the new position of each parameter into account.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D43132
llvm-svn: 325820
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: hfinkel, bollu, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D43081
llvm-svn: 325816
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
from changing naked functions.
- These passes can perform some alterations to the functions that should not be
applied. An example is removing parameters that are seemingly not used because
they are only referenced in the inline assembly. Another example is marking
the function as fastcc.
llvm-svn: 325788
Summary:
Exposing getOffset and findFunctionSamples as members of
SampleProfile. They are intimately tied to design choices of the
sample profile format - using offsets instead of line numbers, and
traversing inlined functions stack, respectively.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43605
llvm-svn: 325747
According to the current coverage report salvageDebugInfo() is called
5.12 million times during testing and almost always returns early.
The early return depends on LocalAsMetadata::getIfExists returning null,
which involves a DenseMap lookup in an LLVMContextImpl. We can probably
speed this up by simply checking the IsUsedByMD bit in Value.
llvm-svn: 325738
This patch changes hwasan inline instrumentation:
Fixes address untagging for shadow address calculation (use 0xFF instead of 0x00 for the top byte).
Emits brk instruction instead of hlt for the kernel and user space.
Use 0x900 instead of 0x100 for brk immediate (0x100 - 0x800 are unavailable in the kernel).
Fixes and adds appropriate tests.
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D43135
llvm-svn: 325711
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.
llvm-svn: 325660
Summary:
We used to remove the first memmove in cases like this:
memmove(p, p+2, 8);
memmove(p, p+2, 8);
which is incorrect. Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.
Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.
Reviewers: rsmith
Subscribers: mcrosier, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D43425
llvm-svn: 325641
These are fdiv-with-constant-divisor, so they already become
reciprocal multiplies. The last gap for vector ops should be
closed with rL325590.
It's possible that we're missing folds for some edge cases
with denormal intermediate constants after deleting these,
but there are no tests for those patterns, and it would be
better to handle denormals more consistently (and less
conservatively) as noted in TODO comments.
llvm-svn: 325595
It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this
should be conservatively better than what we have right now. GCC allows this with
only -freciprocal-math.
The last test is changed to show a case that is expected to fold, but we need D43398.
llvm-svn: 325533
Summary:
Several for loops in PromoteMemoryToRegister.cpp leave their increment
expression empty, instead incrementing the iterator within the for loop
body. I believe this is because these loops were previously implemented
as while loops; see https://reviews.llvm.org/rL188327.
Incrementing the iterator within the body of the for loop instead of
in its increment expression makes it seem like the iterator will be
modified or conditionally incremented within the loop, but that is not
the case in these loops.
Instead, use range loops.
Test Plan: `check-llvm`
Reviewers: davide, bkramer
Reviewed By: davide, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43473
llvm-svn: 325532
The last fold that used to be here was not necessary. That's a
combination of 2 folds (and there's a regression test to show that).
The transforms are guarded by isFast(), but that should be loosened.
llvm-svn: 325531
Summary:
Move a debug statement to above where an assertion is hit, so that the debug
statement can be inspected before a stack trace.
Test Plan: `check-llvm`
llvm-svn: 325529
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.
Third attempt - moved function from lambda to static function due to build failures.
llvm-svn: 325506
With this patch in place, when a new-format TBAA tag is available
for a memory-transfer intrinsic call, we prefer propagating that
new-format tag. Otherwise, we fallback to the old approach where
we try to construct a proper TBAA access tag from 'tbaa.struct'
metadata.
Differential Revision: https://reviews.llvm.org/D41543
llvm-svn: 325488
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.
Second attempt, since last patch caused stage2 build to fail (now using function_ref rather than std::function).
Reverted due to buildbot failures
llvm-svn: 325454
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.
Second attempt, since last patch caused stage2 build to fail (now using function_ref rather than std::function).
llvm-svn: 325448
...and delete the equivalent local functiona from InstCombine.
These might be useful to other InstCombine files or other passes
and makes FP queries more similar to integer constant queries.
llvm-svn: 325398
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36133
Reviewers: sebpop, dberlin, kuhar
Reviewed by: sebpop, kuhar
Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov
Differential Revision: https://reviews.llvm.org/D42717
llvm-svn: 325356
Now that we have the new TBAA metadata format that is capable of
representing accesses to aggregates, we can propagate TBAA access
tags from memory setting and transferring intrinsics to load and
store instructions and vice versa.
Since SROA produces lots of new loads and stores on optimized
builds, this change significantly decreases the share of
undecorated memory accesses on such builds.
Differential Revision: https://reviews.llvm.org/D41563
llvm-svn: 325329
In r325063, we salvaged debug values from dying instructions in
GVN::processBlock() and GVN::performScalarPRE().
The change in performScalarPRE(), while correct, is unhelpful. It
introduced a call to salvageDebugInfo() which was immediately followed
by a RAUW, meaning it prevented the RAUW from efficiently updating
dbg.value intrinsics. This commit reverts the mistake and tightens up
the affected test case.
llvm-svn: 325308
This results in small increases in the size of the .debug_loc section
and the number of unique source variables in a stage2 build of opt.
llvm-svn: 325301
Summary:
The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7`
allows coroutine parameters to be passed into allocator functions.
The instructions to store values into the alloca'd parameters must not
be moved past the frame allocation, otherwise uninitialized values are
passed to the allocator.
Test Plan: `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: compnerd, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43000
llvm-svn: 325285
The variable name 'AllowReassociate' is a lie at this point because
it's set to 'isFast()' which is more than the 'reassoc' FMF after
rL317488.
In D41286, we showed that this transform may be valid even with strict
math by brute force checking every 32-bit float result.
There's a potential problem here because we're replacing with a tan()
libcall rather than a hypothetical LLVM tan intrinsic. So we might
set errno when we should be guaranteed not to do that. But that's
independent of this change.
llvm-svn: 325247
Move computeLoopSafetyInfo, defined in Transforms/Utils/LoopUtils.h,
into the corresponding LoopUtils.cpp, as opposed to LICM where it resides
at the moment. This will allow other functions from Transforms/Utils
to reference it.
llvm-svn: 325151
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.
Fixes PR36362.
Differential Revision: https://reviews.llvm.org/D43276
llvm-svn: 325148
This keeps with our current usage of 'match' and is easier to see that
the optional NSW only applies in the non-constant operand case.
llvm-svn: 325140
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43022
llvm-svn: 325134
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325126
We can use incremental dominator tree updates to avoid re-calculating
the dominator tree after interchanging 2 loops.
Reviewers: dmgreen, kuhar
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D43176
llvm-svn: 325122
Preserve debug info from a dead 'and' instruction with a constant.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D43163
llvm-svn: 325119
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
This preserves an additional 581 unique source variables in a stage2
build of clang (according to `llvm-dwarfdump --statistics`). It
increases the size of the .debug_loc section by 0.1% (or 87139 bytes).
Differential Revision: https://reviews.llvm.org/D43255
llvm-svn: 325063
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly.
There is no evidence in existing regression tests that the greater
power of bit-tracking was needed here, but we should be aware of
this potential loss of optimization.
llvm-svn: 325062
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an
indirect and expensive way.
llvm-svn: 325048
According to `llvm-dwarfdump --statistics` this salvages 43 additional
unique source variables in a stage2 build of clang. It increases the
size of the .debug_loc section by 0.002% (or 2864 bytes).
Differential Revision: https://reviews.llvm.org/D43220
llvm-svn: 325035
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325001
In cases where the OuterMostLoopLatchBI only has a single successor,
accessing the second successor will fail.
This fixes a failure when building the test-suite with loop-interchange
enabled.
Reviewers: mcrosier, karthikthecool, davide
Reviewed by: karthikthecool
Differential Revision: https://reviews.llvm.org/D42906
llvm-svn: 324994
We already try to salvage debug values from no-op bitcasts and inttoptr
instructions: we should handle ptrtoint instructions as well.
This saves an additional 24,444 debug values in a stage2 build of clang,
and (according to llvm-dwarfdump --statistics) provides an additional
289 unique source variables.
llvm-svn: 324982
Here are the number of additional debug values salvaged in a stage2
build of clang:
63 SALVAGE: MUL
1250 SALVAGE: SDIV
(No values were salvaged from `srem` instructions in this experiment,
but it's a simple case to handle so we might as well.)
llvm-svn: 324976
Here are the number of additional debug values salvaged in a stage2
build of clang:
1912 SALVAGE: ASHR
405 SALVAGE: LSHR
249 SALVAGE: SHL
llvm-svn: 324975
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and
instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify
the source and destination alignment attributes separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: majnemer, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D42871
llvm-svn: 324960
It caused assertion failure
Assertion failed: (!DD.IsLambda && !MergeDD.IsLambda && "faked up lambda definition?"), function MergeDefinitionData, file /Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/lib/Serialization/ASTReaderDecl.cpp, line 1675.
on the second stage build bots.
llvm-svn: 324932
This is similar to the instsimplify fold added with D42385
( rL323716 )
...but this can't be in instsimplify because we're creating/morphing
a different instruction.
llvm-svn: 324927
Update BlockColors after splitting predecessors. Do not allow splitting
EHPad for sinking when the BlockColors is not empty, so we can
simply assign predecessor's color to the new block.
Fixes PR36184
llvm-svn: 324916
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
llvm-svn: 324893
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.
llvm-svn: 324854
The related cases for (X * Y) / X were handled in rL124487.
https://rise4fun.com/Alive/6k9
The division in these tests is subsequently eliminated by existing instcombines
for 1/X.
llvm-svn: 324843
Summary:
If -pass-remarks=loop-vectorize, atomic ops will be seen by
analyzeInterleaving(), even though canVectorizeMemory() == false. This
is because we are requesting extra analysis instead of bailing out.
In such a case, we end up with a Group in both Load- and StoreGroups,
and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups.
The fix is to include mayWriteToMemory() when validating that two
instructions are the same kind of memory operation.
Reviewers: mssimpso, davidxl
Reviewed By: davidxl
Subscribers: hsaito, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D43064
llvm-svn: 324786
Extend salvageDebugInfo to preserve the debug info from a dead 'or'
with a constant.
Patch by Ismail Badawi!
Differential Revision: https://reviews.llvm.org/D43129
llvm-svn: 324764
Summary:
For symbols that has linkonce_odr linkage and unnamed_addr, it can be
auto hide by linker to avoid weak external symbols. Teach ThinLTO to
perform auto hide so it can safely promote linkonce_odr to weak symbols
without breaking this nice property.
Reviewers: tejohnson, mehdi_amini
Reviewed By: tejohnson
Subscribers: inglorion, eraman, rnk, pcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D43130
llvm-svn: 324757
Summary:
Kernel addresses have 0xFF in the most significant byte.
A tag can not be pushed there with OR (tag << 56);
use AND ((tag << 56) | 0x00FF..FF) instead.
Reviewers: kcc, andreyknvl
Subscribers: srhines, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42941
llvm-svn: 324691
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
DataFlowSanitizer pass to cease using the old get/setAlignment() API of MemoryIntrinsic
in favour of getting source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324654
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AddressSanitizer pass to cease using The old IRBuilder CreateMemCpy single-alignment API
in favour of the new API that allows setting source and destination alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324653
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
MemorySanitizer pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs
in favour of the new API that allows setting source and destination alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324642
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
LoopIdiom pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs in
favour of the new API that allows setting source and destination alignments independently.
This allows us to be slightly more aggressive in setting the alignment of memcpy calls that
loop idiom creates.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324626
Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors.
llvm-svn: 324603
Summary:
GVN hoist pass is using PostDominatorTree analysis, therefore the analysis
should be listed in the pass initialization as a dependency.
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D43007
Author: ashlykov <arkady.shlykov@intel.com>
llvm-svn: 324597
Add support of uge and sge latch condition to Loop Prediction for
reverse loops.
Reviewers: apilipenko, mkazantsev, sanjoy, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42837
llvm-svn: 324589
With fix: reimplemented.
Original commit message:
Recently introduced convertToDeclaration is very similar
to code used in filterModule function.
Patch reuses it to reduce duplication.
Differential revision: https://reviews.llvm.org/D42971
llvm-svn: 324574
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
The test profile/Linux/counter_promo_nest.c in compiler-rt project
is updated to meet this change.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324572
Summary:
Loops with inequality comparers, such as:
// unsigned bound
for (unsigned i = 1; i < bound; ++i) {...}
have getSmallConstantMaxTripCount report a large maximum static
trip count - in this case, 0xffff fffe. However, profiling info
may show that the trip count is much smaller, and thus
counter-recommend vectorization.
This change:
- flips loop-vectorize-with-block-frequency on by default.
- validates profiled loop frequency data supports vectorization,
when static info appears to not counter-recommend it. Absence
of profile data means we rely on static data, just as we've
done so far.
Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal
Reviewed By: davidxl
Subscribers: bkramer, llvm-commits
Differential Revision: https://reviews.llvm.org/D42946
llvm-svn: 324543
Recently introduced convertToDeclaration is very similar
to code used in filterModule function.
Patch reuses it to reduce duplication.
Differential revision: https://reviews.llvm.org/D42971
llvm-svn: 324455
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
DeadStoreElimination pass to cease using the old getAlignment() API of MemoryIntrinsic
in favour of getting dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324402
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InferAddressSpaces pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324395
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InlineFunction pass to ceause using the old IRBuilder CreateMemCpy single-alignment API
in favour of the new API that allows setting source and destination alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324384
- Fix condition for detecting that a complex basic block was the first in
the chain.
- Add tests.
This was caught by buildbots when submitting rL324319.
llvm-svn: 324341
It is better to update pointer of the DISuprogram before we call RAUW for
still live arguments of the function, because with the change reviewed in
D42541 in RAUW we compare DISubprograms rather than functions itself.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D42794
llvm-svn: 324335
If the inline asm provides the definition of a symbol, this can result
in duplicate symbol errors.
Differential Revision: https://reviews.llvm.org/D42944
llvm-svn: 324313
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
LowerMemIntrinsics pass to cease using the old getAlignment() API of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324278
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SimplifyLibCalls pass to cease using the old IRBuilder createMemCpy/createMemMove
single-alignment APIs in favour of the new API that allows setting source and destination
alignments independently.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, r3L24148 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324273
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited:
https://reviews.llvm.org/D37510https://reviews.llvm.org/D37534
It converts unsigned saturated subtraction patterns to forms recognized
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
Patch by Yulia Koval!
Differential Revision: https://reviews.llvm.org/D41480
llvm-svn: 324255
There was a logic hole in D42739 / rL324014 because we're not accounting for select and phi
instructions that might have repeated operands. This is likely a source of an infinite loop.
I haven't manufactured a test case to prove that, but it should be safe to speculatively limit
this transform to binops while we try to create that test.
llvm-svn: 324252
This broke the Chromium build; see PR36238.
> This patch is an enhancement to propagate dbg.value information when
> Phis are created on behalf of LCSSA. I noticed a case where a value
> carried across a loop was reported as <optimized out>.
>
> Specifically this case:
>
> int bar(int x, int y) {
> return x + y;
> }
>
> int foo(int size) {
> int val = 0;
> for (int i = 0; i < size; ++i) {
> val = bar(val, i); // Both val and i are correct
> }
> return val; // <optimized out>
> }
>
> In the above case, after all of the interesting computation completes
> our value is reported as "optimized out." This change will add a
> dbg.value to correct this.
>
> This patch also moves the dbg.value insertion routine from
> LoopRotation.cpp into Local.cpp, so that we can share it in both places
> (LoopRotation and LCSSA).
>
> Patch by Matt Davis!
>
> Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 324247
Summary:
This complements the fixes in r323633 and r324075 which drop the
definitions of dead functions and variables, respectively.
Fixes PR36208.
Reviewers: grimar, rafael
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42856
llvm-svn: 324242
The patch causes the failure of the test
compiler-rt/test/profile/Linux/counter_promo_nest.c
To unblock buildbot, revert the patch while investigation is in progress.
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324214
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324208
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
To bugs additionally fixed:
assert is moved after the check whether loop is not a nullptr.
Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow
is guarded by isAvailableAtLoopEntry.
Reviewers: sanjoy, mkazantsev, anna, dorit, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42417
llvm-svn: 324204
When using the partial inliner, we might have attributes for forwarded
varargs, but the CodeExtractor does not create an empty argument
attribute set for regular arguments in that case, because it does not know
of the additional arguments. So in case we have attributes for VarArgs, we
also have to make sure we create (empty) attributes for all regular arguments.
This fixes PR36210.
llvm-svn: 324197
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.
PR35734
Differential Revision: https://reviews.llvm.org/D42309
llvm-svn: 324195
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 324174
Summary:
When creating the debug fragments for a SRA'd variable, use the types'
allocation sizes. This fixes issues where the pass would emit too small
fragments, placed at the wrong offset, for padded types.
An example of this is long double on x86. The type is represented using
x86_fp80, which is 10 bytes, but the value is aligned to 12/16 bytes.
The padding is included in the type's DW_AT_byte_size attribute;
therefore, the fragments should also include that. Newer GCC releases
(I tested 7.2.0) emit 12/16-byte pieces for long double. Earlier
releases, e.g. GCC 5.5.0, behaved as LLVM did, i.e. by emitting a
10-byte piece, followed by an empty 2/6-byte piece for the padding.
Failing to cover all `DW_AT_byte_size' bytes of a value with non-empty
pieces results in the value being printed as <optimized out> by GDB.
Patch by: David Stenberg
Reviewers: aprantl, JDevlieghere
Reviewed By: aprantl, JDevlieghere
Subscribers: llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D42807
llvm-svn: 324066
This is the enhancement suggested in D42536 to fix a shortcoming in
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can
allow that expression to be narrowed or widened for the same cost as a single-use
value.
AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified
away if the operands are the same value; add becomes shl; shifts with a variable shift
amount aren't handled.
Differential Revision: https://reviews.llvm.org/D42739
llvm-svn: 324014
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 323951
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.
For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.
Patch by: Bevin Hansson
Reviewers: atrick, qcolombet, sanjoy
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42103
llvm-svn: 323946
For very, very large global initializers which can be statically evaluated, the
code would create vectors of temporary Constants, modifying them in place,
before committing the resulting Constant aggregate to the global's initializer
value. This had effectively O(n^2) complexity in the size of the global
initializer and would cause memory and non-termination issues compiling some
workloads.
This change performs the static initializer evaluation and creation in batches,
once for each global in the evaluated IR memory. The existing code is maintained
as a last resort when the initializers are more complex than simple values in a
large aggregate. This should theoretically by NFC, no test as the example case
is massive. The existing test cases pass with this, as well as the llvm test
suite.
To give an example, consider the following C++ code adapted from the clang
regression tests:
struct S {
int n = 10;
int m = 2 * n;
S(int a) : n(a) {}
};
template<typename T>
struct U {
T *r = &q;
T q = 42;
U *p = this;
};
U<S> e;
The global static constructor for 'e' will need to initialize 'r' and 'p' of
the outer struct, while also initializing the inner 'q' structs 'n' and 'm'
members. This batch algorithm will simply use general CommitValueTo() method
to handle the complex nested S struct initialization of 'q', before
processing the outermost members in a single batch. Using CommitValueTo() to
handle member in the outer struct is inefficient when the struct/array is
very large as we end up creating and destroy constant arrays for each
initialization.
For the above case, we expect the following IR to be generated:
%struct.U = type { %struct.S*, %struct.S, %struct.U* }
%struct.S = type { i32, i32 }
@e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e,
i64 0, i32 1),
%struct.S { i32 42, i32 84 }, %struct.U* @e }
The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex
constant expression, while the other two elements of @e are "simple".
Differential Revision: https://reviews.llvm.org/D42612
llvm-svn: 323933
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.
Differential Revision: https://reviews.llvm.org/D42622
llvm-svn: 323926
If you have a long chain of select instructions created from something
like `int* p = &g; if (foo()) p += 4; if (foo2()) p += 4;` etc., a naive
recursive visitor will recursively visit each select twice, which is
O(2^N) in the number of select instructions. Use the visited set to cut
off recursion in this case.
(No testcase because this doesn't actually change the behavior, just the
time.)
Differential Revision: https://reviews.llvm.org/D42451
llvm-svn: 323910
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.
Differential Revision: https://reviews.llvm.org/D42683
llvm-svn: 323862
Summary:
This is exposed during ThinLTO compilation, when we import an alias by
creating a clone of the aliasee. Without this fix the debug type is
unnecessarily cloned and we get a duplicate, undoing the uniquing.
Fixes PR36089.
Reviewers: mehdi_amini, pcc
Subscribers: eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41669
llvm-svn: 323813
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
Summary:
There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and
findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions,
and the former does not. This appears to be simple oversight. This patch remedies
the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector().
Reviewers: DaniilSuchkov, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42653
llvm-svn: 323764
Summary:
The JumpThreading pass has several locations where to the variable name LI
refers to a LoadInst type. This is confusing and inhibits the ability to use
LI for LoopInfo as a member of the JumpThreading class. Minor formatting
and comments were also altered to reflect this change.
Reviewers: dberlin, kuba, spop, sebpop
Reviewed by: sebpop
Subscribers: sebpop, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42601
llvm-svn: 323695
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323662
This pretty much reverts r322006, except that we keep the test,
because we work around the issue exposed in a different way (a
recursion limit in value tracking). There's still probably some
sequence that exposes this problem, and the proper way to fix that
for somebody who has time is outlined in the code review.
llvm-svn: 323630
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:
%c1 = zext i16 %A to i32
%c2 = sext i32 %c1 to i64
InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.
Differential Revision: https://reviews.llvm.org/D42566
llvm-svn: 323570
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323530
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.
Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551
llvm-svn: 323482
Summary:
The intent of this is to allow the code to be used with ThinLTO. In
Thinlink phase, a traditional Callgraph can not be computed even though
all the necessary information (nodes and edges of a call graph) is
available. This is due to the fact that CallGraph class is closely tied
to the IR. This patch first extends GraphTraits to add a CallGraphTraits
graph. This is then used to implement a version of counts propagation
on a generic callgraph.
Reviewers: davidxl
Subscribers: mehdi_amini, tejohnson, llvm-commits
Differential Revision: https://reviews.llvm.org/D42311
llvm-svn: 323475
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA. I noticed a case where a value
carried across a loop was reported as <optimized out>.
Specifically this case:
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.
This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).
Patch by Matt Davis!
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 323472
Right now clang uses "_n" suffix for some user space callbacks and "N" for the matching kernel ones. There's no need for this and it actually breaks kernel build with inline instrumentation. Use the same callback names for user space and the kernel (and also make them consistent with the names GCC uses).
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D42423
llvm-svn: 323470
It was reverted after buildbot regressions.
Original commit message:
This allows relative block frequency of call edges to be passed
to the thinlink stage where it will be used to compute synthetic
entry counts of functions.
llvm-svn: 323460
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323441
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323430
Summary:
When creating the debug fragments for a SRA'd struct, use the fields'
offsets, taken from the struct layout, as the offsets for the resulting
fragments. This fixes an issue where GlobalOpt would emit fragments with
incorrect offsets for padded fields.
This should solve PR36016.
Patch by David Stenberg.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42489
llvm-svn: 323411
It causes regressions in various OpenGL test suites.
Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.
Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323348
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.
llvm-svn: 323309
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).
Reviewers: loladiro, rafael, bogner
Reviewed By: bogner
Subscribers: hintonda, mgorny, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D41638
llvm-svn: 323266
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323246
Summary:
This patch is adding remark messages to the LoopVersioning LICM pass,
which will be useful for optimization remark emitter (ORE) infrastructure.
Patch by: Deepak Porwal
Reviewers: anemet, ashutosh.nema, eastig
Subscribers: eastig, vivekvpandya, fhahn, llvm-commits
llvm-svn: 323183
Currently ASan instrumentation pass forces callback
instrumentation when applied to the kernel.
This patch changes the current behavior to allow
using inline instrumentation in this case.
Authored by andreyknvl. Reviewed in:
https://reviews.llvm.org/D42384
llvm-svn: 323140
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165
llvm-svn: 323077
...when the shift is known to not overflow with the matching
signed-ness of the division.
This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709
Patch by Anton Bikineev!
Differential Revision: https://reviews.llvm.org/D42032
llvm-svn: 323068
We already had the pointer being stored to in the MemLoc, reuse that code. In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities. Fix that by making sure the input passes hasAnalyzableWrite as well.
llvm-svn: 323056
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.
For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:
%v1 = bitcast i32* %v0 to i8*
br label %bb3
bb2:
%v3 = bitcast i32* %v2 to i8*
br label %bb3
bb3:
%p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
%retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
%v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
ret i32* %retval
Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.
rdar://problem/15894705
llvm-svn: 323009
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.
Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41948
llvm-svn: 322946
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.
Differential Revision: https://reviews.llvm.org/D39958
llvm-svn: 322930
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.
Reviewers: kcc
Subscribers: srhines, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42141
llvm-svn: 322785
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.
llvm-svn: 322662
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41858
llvm-svn: 322599
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33954
llvm-svn: 322579
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.
The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.
This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.
Differential Revision: https://reviews.llvm.org/D41981
llvm-svn: 322533
Summary:
This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.
This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.
Fixes PR35773.
Reviewers: dorit, Ayal, mssimpso
Reviewed By: dorit
Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41913
llvm-svn: 322473
Summary:
In preparation for https://reviews.llvm.org/D41675 this NFC changes this
prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead
of a Constant.
llvm-svn: 322403
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perform the
preversation was minimally altered and simply marked as
preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements such as threading across loop headers.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 322401
Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value.
Aside from that, `End` can also be `nullptr` which can be later conditionally
converted into a non-null SCEV.
To make this logic more transparent, this patch makes `End` a SCEV and
calculates it early, so that it is never a null.
Differential Revision: https://reviews.llvm.org/D39590
llvm-svn: 322364
This is a fix for PR35884.
When we want to delete dead loop we must clean uses in unreachable blocks
otherwise we'll get an assert during deletion of instructions from the loop.
Reviewers: anna, davide
Reviewed By: anna
Subscribers: llvm-commits, lebedev.ri
Differential Revision: https://reviews.llvm.org/D41943
llvm-svn: 322357
Summary:
Very basic stack instrumentation using tagged pointers.
Tag for N'th alloca in a function is built as XOR of:
* base tag for the function, which is just some bits of SP (poor
man's random)
* small constant which is a function of N.
Allocas are aligned to 16 bytes. On every ReturnInst allocas are
re-tagged to catch use-after-return.
This implementation has a bunch of issues that will be taken care of
later:
1. lifetime intrinsics referring to tagged pointers are not
recognized in SDAG. This effectively disables stack coloring.
2. Generated code is quite inefficient. There is one extra
instruction at each memory access that adds the base tag to the
untagged alloca address. It would be better to keep tagged SP in a
callee-saved register and address allocas as an offset of that XOR
retag, but that needs better coordination between hwasan
instrumentation pass and prologue/epilogue insertion.
3. Lifetime instrinsics are ignored and use-after-scope is not
implemented. This would be harder to do than in ASan, because we
need to use a differently tagged pointer depending on which
lifetime.start / lifetime.end the current instruction is dominated
/ post-dominated.
Reviewers: kcc, alekseyshl
Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41602
llvm-svn: 322324
While updating clang tests for having clang set dso_local I noticed
that:
- There are *a lot* of tests to update.
- Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.
llvm-svn: 322317
LoadInst isn't enough; we need to include intrinsics that perform loads too.
All side-effecting intrinsics and such are already covered by the isSafe
check, so we just need to care about things that read from memory.
D41960, originally from D33179.
llvm-svn: 322311
parent function
Ideally we should merge the attributes from the functions somehow, but
this is obviously an improvement over taking random attributes from the
caller which will trip up the verifier if they're nonsensical for an
unary intrinsic call.
llvm-svn: 322284
The function can take a significant amount of time on some
complicated test cases, but for the currently only use of
the function we can stop the initialization much earlier
when we find out we are going to discard the result anyway
in the caller of the function.
Adding configurable cut-off points so that we avoid wasting time.
NFCI.
llvm-svn: 322248
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.
This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.
The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.
The addition of alias data increases the size of Chrome build artifacts
by less than 1%.
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc
Differential Revision: https://reviews.llvm.org/D41741
llvm-svn: 322139
Summary:
When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects.
This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time.
Reviewers: davide, sanjoy, bruno, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38856
llvm-svn: 322125
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
Because of potential UB (known bits conflicts with an llvm.assume),
we have to check rather than assert here because InstSimplify doesn't
kill the compare:
https://bugs.llvm.org/show_bug.cgi?id=35846
llvm-svn: 322104
EarlyCSE did not try to salvage debug info during erasing of instructions.
This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D41496
llvm-svn: 322083
This is an attempt of fixing PR35807.
Due to the non-standard definition of dominance in LLVM, where uses in
unreachable blocks are dominated by anything, you can have, in an
unreachable block:
%patatino = OP1 %patatino, CONSTANT
When `SimplifyInstruction` receives a PHI where an incoming value is of
the aforementioned form, in some cases, loops indefinitely.
What I propose here instead is keeping track of the incoming values
from unreachable blocks, and replacing them with undef. It fixes this
case, and it seems to be good regardless (even if we can't prove that
the value is constant, as it's coming from an unreachable block, we
can ignore it).
Differential Revision: https://reviews.llvm.org/D41812
llvm-svn: 322006
There is precedence for factorization transforms in instcombine for FP ops with fast-math.
We also have similar logic in foldSPFofSPF().
It would take more work to add this to reassociate because that's specialized for binops,
and min/max are not binops (or even single instructions). Also, I don't have evidence that
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.
In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:
int test(int xc, int xm, int xy) {
int xk;
if (xc < xm)
xk = xc < xy ? xc : xy;
else
xk = xm < xy ? xm : xy;
return xk;
}
This patch solves that problem because we recognize more min/max patterns after rL321672
https://rise4fun.com/Alive/Qjnehttps://rise4fun.com/Alive/3yg
Differential Revision: https://reviews.llvm.org/D41603
llvm-svn: 321998
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
llvm-svn: 321994
Summary:
If the vectorized value is marked as extra reduction argument, its users
are not considered as external users. Patch fixes this.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41786
llvm-svn: 321993
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.
llvm-svn: 321974
Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.
VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file. The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it. Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.
Previous step was: r320900 (https://reviews.llvm.org/D41045)
Patch by Hideki Saito, thanks!
Differential Revision: https://reviews.llvm.org/D41420
llvm-svn: 321962
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:
* attributes related to memory accesses, as alloca instructions
could be moved in/out of the extracted block
* control-flow dependent attributes, like no_return or thunk, as the
relerelevant instructions might or might not get extracted.
Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.
Reviewers: efriedma, davidxl, davide, silvas
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41334
llvm-svn: 321961
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.
In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603
llvm-svn: 321936
Having a single call to findDbgUsers() allows salvageDebugInfo() to
return earlier.
Differential Revision: https://reviews.llvm.org/D41787
llvm-svn: 321915
Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b),
the use checking and operand creation were off. We were potentially creating
repeated identical instructions of existing values. This led to infinite
looping after I added the extra folds.
By using the simpler m_Not matcher and not creating new 'not' ops for a and b,
we avoid that problem. It's possible that not using IsFreeToInvert() here is
more limiting than the simpler matcher, but there are no tests for anything
more exotic. It's also possible that we should relax the use checking further
to handle a case like PR35834:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but we can make that a follow-up if it is needed.
llvm-svn: 321882
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 321825
This came up during discussions in llvm-commits for
rL321653: Check for unreachable preds before updating LI in
UpdateAnalysisInformation
The assert provides hints to passes to require both DT and LI if we plan on
updating LI through this function.
Tests run: make check
llvm-svn: 321805
The work order was changed in r228186 from SCC order
to RPO with an arbitrary sorting function. The sorting
function attempted to move inner loop nodes earlier. This
was was apparently relying on an assumption that every block
in a given loop / the same loop depth would be seen before
visiting another loop. In the broken testcase, a block
outside of the loop was encountered before moving onto
another block in the same loop. The testcase would then
structurize such that one blocks unconditional successor
could never be reached.
Revert to plain RPO for the analysis phase. This fixes
detecting edges as backedges that aren't really.
The processing phase does use another visited set, and
I'm unclear on whether the order there is as important.
An arbitrary order doesn't work, and triggers some infinite
loops. The reversed RPO list seems to work and is closer
to the order that was used before, minus the arbitary
custom sorting.
A few of the changed tests now produce smaller code,
and a few are slightly worse looking.
llvm-svn: 321751
Summary:
We are incorrectly updating the LI when loop-simplify generates
dedicated exit blocks for a loop. The issue is that there's an implicit
assumption that the Preds passed into UpdateAnalysisInformation are
reachable. However, this is not true and breaks LI by incorrectly
updating the header of a loop.
One such case is when we generate dedicated exits when the exit block is
a landing pad (through SplitLandingPadPredecessors). There maybe other
cases as well, since we do not guarantee that Preds passed in are
reachable basic blocks.
The added test case shows how loop-simplify breaks LI for the outer loop (and DT in turn)
after we try to generate the LoopSimplifyForm.
Reviewers: davide, chandlerc, sanjoy
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41519
llvm-svn: 321653
`RewriteStatepointsForGC` iterates over function blocks and their predecessors
in order of declaration. One of outcomes of this is that callsites are placed in
arbitrary order which has nothing to do with travelsar order.
On the other hand, function `recomputeLiveInValues` asserts that bases are
added to `Info.PointerToBase` before their deried pointers are updated. But
if call sites are processed in order different from RPOT, this is not necessarily
true. We cannot guarantee that the base was placed there before every
pointer derived from it. All we can guarantee is that this base was marked as
known base by this point.
This patch replaces the fact that we assert from checking that the base was
added to the map with assert that the base was marked as known base.
Differential Revision: https://reviews.llvm.org/D41593
llvm-svn: 321517
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.
Reduced test case:
#include <assert.h>
#include <stdio.h>
#include <string>
#include <utility>
#include <vector>
void do_push_back(
std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
}
int __attribute__((optnone)) main() {
// Put some data in the vector and then remove it so we take the push_back
// fast path.
std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
crl_set.push_back({"asdf", {}});
crl_set.pop_back();
printf("first word in vector storage: %p\n", *(void**)crl_set.data());
// Do the push_back which may fail to initialize the data.
do_push_back(&crl_set);
auto* first = &crl_set.back().first;
printf("first word in vector storage (should be zero): %p\n",
*(void**)crl_set.data());
assert(first->empty());
puts("ok");
}
Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib
This program will assert with this change.
llvm-svn: 321510
By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.
Reviewed By: junbuml, davide
Differential Revision: https://reviews.llvm.org/D40729
llvm-svn: 321413
This code was originally removed and replace with an assertion
because believed unnecessary. It turns out there was simply
no test coverage for this case, and the constant folder doesn't
yet know about patterns like `br undef %label1, %label2`.
Presumably at some point the constant folder might learn about
these patterns, but it's a broader change.
A testcase will be added to make sure this doesn't regress again
in the future.
Fixes PR35723.
llvm-svn: 321402
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.
This patch checks for the long dependence chain, and give up if-conversion if find one.
Differential Revision: https://reviews.llvm.org/D39352
llvm-svn: 321377
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331
If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.
llvm-svn: 321236
This patch modifies the indirect call promotion utilities by exposing and using
an unconditional call promotion interface. The unconditional promotion
interface (i.e., call promotion without creating an if-then-else) can be used
if it's known that an indirect call has only one possible callee. The existing
conditional promotion interface uses this unconditional interface to promote an
indirect call after it has been versioned and placed within the "then" block.
A consequence of unconditional promotion is that the fix-up operations for phi
nodes in the normal destination of invoke instructions are changed. This is
necessary because the existing implementation assumed that an invoke had been
versioned, creating a "merge" block where a return value bitcast could be
placed. In the new implementation, the edge between a promoted invoke's parent
block and its normal destination is split if needed to add a bitcast for the
return value. If the invoke is also versioned, the phi node merging the return
value of the promoted and original invoke instructions is placed in the "merge"
block.
Differential Revision: https://reviews.llvm.org/D40751
llvm-svn: 321210
Summary: Very similar to AddressSanitizer, with the exception of the error type encoding.
Reviewers: kcc, alekseyshl
Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41417
llvm-svn: 321203
canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
This doesn't make sense to me because reporting analysis information shouldn't alter legality
checks. This is probably the result of a last minute minor change before committing (?).
Patch by Diego Caballero.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D40973
llvm-svn: 321172
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
This is r319482 and r319483, along with fixes for PR35519: fix the
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.
Fixes PR28958 and PR35519.
Differential Revision: https://reviews.llvm.org/D40802
llvm-svn: 321138
PRE in JumpThreading should not be able to hoist copy of non-speculable loads across
instructions that don't always transfer execution to their successors, otherwise they may
introduce an unsafe load which otherwise would not be executed.
The same problem for GVN was fixed as rL316975.
Differential Revision: https://reviews.llvm.org/D40347
llvm-svn: 321063
Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().
Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41307
llvm-svn: 321018
This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.
llvm-svn: 320958
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.
llvm-svn: 320929
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.
More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs
into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs
into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node
when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a
variety of ways.
a. For #2, the vector path is missing the case for setlt with a '1' constant.
b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel
produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the
shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift
produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific
decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.
define i32 @abs_shifty(i32 %x) {
%signbit = ashr i32 %x, 31
%add = add i32 %signbit, %x
%abs = xor i32 %signbit, %add
ret i32 %abs
}
define i32 @abs_cmpsubsel(i32 %x) {
%cmp = icmp slt i32 %x, zeroinitializer
%sub = sub i32 zeroinitializer, %x
%abs = select i1 %cmp, i32 %sub, i32 %x
ret i32 %abs
}
define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
%signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%add = add <4 x i32> %signbit, %x
%abs = xor <4 x i32> %signbit, %add
ret <4 x i32> %abs
}
define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
%cmp = icmp slt <4 x i32> %x, zeroinitializer
%sub = sub <4 x i32> zeroinitializer, %x
%abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
ret <4 x i32> %abs
}
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx
> abs_shifty:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_cmpsubsel:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_shifty_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> abs_cmpsubsel_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_cmpsubsel:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_shifty_vec:
> abs v0.4s, v0.4s
> ret
>
> abs_cmpsubsel_vec:
> abs v0.4s, v0.4s
> ret
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le
> abs_shifty:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_cmpsubsel:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_shifty_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
> abs_cmpsubsel_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
Differential Revision: https://reviews.llvm.org/D40984
llvm-svn: 320921
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.
Fixes PR34965.
Patch by Diego Caballero, thanks!
Differential Revision: https://reviews.llvm.org/D40742
llvm-svn: 320913
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:
sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))
Patch by Paul Walker, thanks!
Differential Revision: https://reviews.llvm.org/D40069
llvm-svn: 320901
This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*):
1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h.
2. Many of VP*Recipe::print() and execute() definitions are still left in
LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To
be moved to VPlan.cpp at a later time.
3. InterleaveGroup class is moved from anonymous namespace to llvm namespace.
Referencing it in anonymous namespace from VPlan.h ended up in warning.
Patch by Hideki Saito, thanks!
Differential Revision: https://reviews.llvm.org/D41045
llvm-svn: 320900
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.
Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).
This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40747
llvm-svn: 320895
This recommits r320823 reverted due to the test failure in sink-foldable.ll and
an unused variable. Added "REQUIRES: aarch64-registered-target" in the test
and removed unused variable.
Original commit message:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320858
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.
Differential Review: https://reviews.llvm.org/D41197
llvm-svn: 320848
This recommit r320823 after fixing a test failure.
Original commit message:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320833
Summary:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320823
Summary:
The port is nearly straightforward.
The only complication is related to the analyses handling,
since one of the analyses used in this module pass is domtree,
which is a function analysis. That requires asking for the results
of each function and disallows a single interface for run-on-module
pass action.
Decided to copy-paste the main body of this pass.
Most of its code is requesting analyses anyway, so not that much
of a copy-paste.
The rest of the code movement is to transform all the implementation
helper functions like stripNonValidData into non-member statics.
Extended all the related LLVM tests with new-pass-manager use.
No failures.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Subscribers: skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D41162
llvm-svn: 320796
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Differential Revision: https://reviews.llvm.org/D38566
llvm-svn: 320749
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted.
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.
Differential Revision: https://reviews.llvm.org/D41139
llvm-svn: 320736
Summary:
Passing AliasAnalysis results instead of nullptr appears to work just fine.
A couple new-pass-manager tests updated to align with new order of analyses.
Reviewers: chandlerc, spatel, craig.topper
Reviewed By: chandlerc
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41203
llvm-svn: 320687
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
This is the last step in addressing PR30654.
llvm-svn: 320672
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 320612
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)
Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.
Differential Revision: https://reviews.llvm.org/D39781
llvm-svn: 320607
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mgrang, dcaballe, hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 320548
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.
If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.
We use the recorded conditions to create the new call sites and split
the basic block.
This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.
Reviewers: davidxl, junbuml
Reviewed By: junbuml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40728
llvm-svn: 320547
Summary: This brings CPU overhead on bzip2 down from 5.5x to 2x.
Reviewers: kcc, alekseyshl
Subscribers: kubamracek, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41137
llvm-svn: 320538
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320525
This algorithm (explained more in the source code) takes into account
global redundancies by building a "pair map" to find common subexprs.
The primary motivation of this is to handle situations like
foo = (a * b) * c
bar = (a * d) * c
where we currently don't identify that "a * c" is redundant.
Accordingly, it prioritizes the emission of a * c so that CSE
can remove the redundant calculation later.
Does not change the actual reassociation algorithm -- only the
order in which the reassociated operand chain is reconstructed.
Gives ~1.5% floating point math instruction count reduction on
a large offline suite of graphics shaders.
llvm-svn: 320515
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.
To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)
Reviewers: davidxl, xur
Reviewed By: davidxl
Subscribers: efriedma, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D40699
llvm-svn: 320511
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320510
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320499
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320496
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320488
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320483
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
1. load off a null
2. load off a GEP with null base
3. store to a null
This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.
Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).
Reviewers: majnemer, hfinkel, sanjoy, davide
Reviewed by: sanjoy, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41026
llvm-svn: 320480
VecValuesToIgnore holds values that will not appear in the vectorized loop.
We should therefore ignore their cost when VF > 1.
Differential Revision: https://reviews.llvm.org/D40883
llvm-svn: 320463
Summary:
This solves PR35616.
We don't want the compiler to generate different code when we compile
with/without -g, so we now ignore debug intrinsics when determining if
the optimization can trigger or not.
Reviewers: junbuml
Subscribers: davide, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41068
llvm-svn: 320460
The tests fail (opt asserts) on Windows.
> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320421
The function stack poisioner conditionally stores local variables
either in an alloca or in malloc'ated memory, which has the
unfortunate side-effect, that the actual address of the variable is
only materialized when the variable is accessed, which means that
those variables are mostly invisible to the debugger even when
compiling without optimizations.
This patch stores the address of the local stack base into an alloca,
which can be referred to by the debug info and is available throughout
the function. This adds one extra pointer-sized alloca to each stack
frame (but mem2reg can optimize it away again when optimizations are
enabled, yielding roughly the same debug info quality as before in
optimized code).
rdar://problem/30433661
Differential Revision: https://reviews.llvm.org/D41034
llvm-svn: 320415
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320407
This patch introduces getShadowOriginPtr(), a method that obtains both the shadow and origin pointers for an address as a Value pair.
The existing callers of getShadowPtr() and getOriginPtr() are updated to use getShadowOriginPtr().
The rationale for this change is to simplify KMSAN instrumentation implementation.
In KMSAN origins tracking is always enabled, and there's no direct mapping between the app memory and the shadow/origin pages.
Both the shadow and the origin pointer for a given address are obtained by calling a single runtime hook from the instrumentation,
therefore it's easier to work with those pointers together.
Reviewed at https://reviews.llvm.org/D40835.
llvm-svn: 320373
Summary:
Reuse the Linux new mapping as it is.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, eugenis, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D41022
llvm-svn: 320219
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
Summary:
If a partially inlined function has debug info, we have to add debug
locations to the call instruction calling the outlined function.
We use the debug location of the first instruction in the outlined
function, as the introduced call transfers control to this statement and
there is no other equivalent line in the source code.
We also use the same debug location for the branch instruction added
to jump from artificial entry block for the outlined function, which just
jumps to the first actual basic block of the outlined function.
Reviewers: davide, aprantl, rriddle, dblaikie, danielcdh, wmi
Reviewed By: aprantl, rriddle, danielcdh
Subscribers: eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D40413
llvm-svn: 320199
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.
Abandon this patch. There is a more general solution
which also handles runtime infinite loop (but not statically).
llvm-svn: 320180
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4
```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40304
llvm-svn: 320157
In more recent Linux kernels with 47 bit VMAs the layout of virtual memory
for powerpc64 changed causing the address sanitizer to not work properly. This
patch adds support for 47 bit VMA kernels for powerpc64 and fixes up test
cases.
https://reviews.llvm.org/D40907
There is an associated patch for compiler-rt.
Tested on several 4.x and 3.x kernel releases.
llvm-svn: 320109
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
As a new access is generated spanning across multiple fields, we need to
propagate alias info from all the fields to form the most generic alias info.
rdar://35602528
Differential Revision: https://reviews.llvm.org/D40617
llvm-svn: 319979
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Differential Revision: https://reviews.llvm.org/D40658
llvm-svn: 319963
Summary:
There is no need to replace the original call instruction if no
VarArgs need to be forwarded.
Reviewers: davide, rnk, majnemer, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40412
llvm-svn: 319947
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
llvm-svn: 319873
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.
Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.
Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel
Subscribers: nlopes, llvm-commits
Differential Revision: https://reviews.llvm.org/D40749
llvm-svn: 319821
This uses ConstantRange::makeGuaranteedNoWrapRegion's newly-added handling for subtraction to allow CVP to remove some subtraction overflow checks.
Differential Revision: https://reviews.llvm.org/D40039
llvm-svn: 319807
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Patch by JesperAntonsson.
Reviewers: spatel, eeckstein, hans
Reviewed By: hans
Subscribers: uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D40639
llvm-svn: 319768
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40750
llvm-svn: 319689
(This reapplies r314253. r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 319677
Summary:
Currently, we only support predication for forward loops with step
of 1. This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
1. The step of the IV is -1.
2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).
This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.
Reviewers: apilipenko, mkazantsev, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40353
llvm-svn: 319659
Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant. In this case, there is no loop invariant value contributing and we'd fail an assert.
The test case was found by a java fuzzer and reduced. It's a real cornercase. You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate. To my knowledge, only the fuzzer has hit this case.
llvm-svn: 319583
It causes builds to fail with "Instruction does not dominate all uses" (PR35497).
> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
> *dst++ = *src++;
> *dst++ = *src++ + 1;
> *dst++ = *src++ + 2;
> *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 319550
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Patch by: JesperAntonsson
Reviewers: spatel, eeckstein, hans
Reviewed By: hans
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40639
llvm-svn: 319537
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 319531
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.
This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.
Differential Revision: https://reviews.llvm.org/D40593
llvm-svn: 319494
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
Fixes PR28958.
Differential Revision: https://reviews.llvm.org/D38374
llvm-svn: 319482
Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.
Here is a simplified motivating example.
struct record {
long long a;
int b;
int c;
};
int func(struct record r) {
for (int i = 0; i < r.c; i++)
r.b++;
return r.b;
}
When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.
With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.
Differential Revision: https://reviews.llvm.org/D32998
llvm-svn: 319407
Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).
Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.
Differential Revision: https://reviews.llvm.org/D38190
llvm-svn: 319398
An alloca may be larger than a variable that is described to be stored
there. Don't create a dbg.value for fragments that are outside of the
variable.
This fixes PR35447.
https://bugs.llvm.org/show_bug.cgi?id=35447
llvm-svn: 319230
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).
Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.
There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.
Review: Björn Petterson
https://reviews.llvm.org/D40339
llvm-svn: 319173
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand
As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.
This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.
The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.
There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.
One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.
I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).
I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.
Differential revision: https://reviews.llvm.org/D37467
llvm-svn: 319164
Summary:
I think we do not need to analyze debug intrinsics here, as they should
not impact codegen. This has 2 benefits: 1) slightly less work to do and
2) avoiding generating optimization remarks for converting calls to
debug intrinsics to tail calls, which are not really helpful for users.
Based on work by Sander de Smalen.
Reviewers: davide, trentxintong, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D40440
llvm-svn: 319158
This is to address a problem similar to those in D37460 for Scalar PRE. We should not
PRE across an instruction that may not pass execution to its successor unless it is safe
to speculatively execute it.
Differential Revision: https://reviews.llvm.org/D38619
llvm-svn: 319147
Revert "[SROA] Propagate !range metadata when moving loads."
Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI."
Davide says they broke a bot.
llvm-svn: 319131
This tries to propagate !range metadata to a pre-existing load
when a load is optimized out. This is done instead of adding an
assume because converting loads to and from assumes creates a
lot of IR.
Patch by Ariel Ben-Yehuda.
Differential Revision: https://reviews.llvm.org/D37216
llvm-svn: 319096
enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
TCK_NoTail = 3 };
TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.
rdar://35639547
llvm-svn: 319075
MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.
This patch moves the address check in front of the shadow access.
llvm-svn: 318901
In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to
help SCEV just in case if it fails to figure them out on its own.
Differential Revision: https://reviews.llvm.org/D40168
llvm-svn: 318898
After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.
e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)
`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.
Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.
The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.
Thanks to Zhendong Su for the report!
Fixes PR35357.
llvm-svn: 318817
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.
New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.
Reviewers: sanjoy, davide, gberry
Subscribers: mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D40274
llvm-svn: 318772
properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++.
Instead, I introduced RPOT. In most cases, it works for CSE.
llvm-svn: 318743
Summary:
Add the following heuristics for irreducible loop metadata:
- When an irreducible loop header is missing the loop header weight metadata,
give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
likely to become irreducible loop headers after indirectbr tail duplication.)
These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39980
llvm-svn: 318693
Summary:
SROA can fail in rewriting alloca but still rewrite a phi resulting
in dead instruction elimination. The Changed flag was not being set
correctly, resulting in downstream passes using stale analyses.
The included test case will assert during the second BDCE pass as a
result.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39921
llvm-svn: 318677
Summary:
This change reverts r318575 and changes FindDynamicShadowStart() to
keep the memory range it found mapped PROT_NONE to make sure it is
not reused. We also skip MemoryRangeIsAvailable() check, because it
is (a) unnecessary, and (b) would fail anyway.
Reviewers: pcc, vitalybuka, kcc
Subscribers: srhines, kubamracek, mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40203
llvm-svn: 318666
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.
The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.
Differential Revision: https://reviews.llvm.org/D38676
llvm-svn: 318645
In rL316552, we ban intersection of unsigned latch range with signed range check and vice
versa, unless the entire range check iteration space is known positive. It was a correct
functional fix that saved us from dealing with ambiguous values, but it also appeared
to be a very restrictive limitation. In particular, in the following case:
loop:
%iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch]
%iv.offset = add i32 %iv, 10
%rc = icmp slt i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
Here, the unsigned iteration range is `[0, 100)`, and the safe range for range
check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned
min/max functions for range intersection. Given this, we wanted to avoid dealing
with `-10` because it is interpreted as a very big unsigned value. Semantically, range
check's safe range goes through unsigned border, so in fact it is two disjoint
ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited
this case saying that we are not allowed to intersect such ranges.
What semantics of this safe range actually means is that we can start from `-10` and go
up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that
`%len - 10` is a reasonably big positive value).
In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return
safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All
values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous.
In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the
original safe space which is represented by one continuous range that does not go through wrap.
In order to reach this, we use modified SCEV substraction function. It can be imagined as a function
that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration
space. This allows us to perform IRCE in many situations when we deal with IV space and range check
of different types (in terms of signed/unsigned).
We apply this approach for both matching and not matching types of IV iteration space and the
range check. One implication of this is that now IRCE became smarter in detection of empty safe
ranges. For example, in this case:
loop:
%iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch]
%iv.offset = sub i32 %iv, 10
%rc = icmp ult i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`,
we could end up executing entire loop in safe preloop while the main loop was still generated,
but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow,
we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization.
Differential Revision: https://reviews.llvm.org/D39954
llvm-svn: 318639
As the first test shows, we could transform an llvm intrinsic which never sets errno
into a libcall which could set errno (even though it's marked readnone?), so that's
not ideal.
It's possible that we can also transform a libcall which could set errno to an
intrinsic given the fast-math-flags constraint, but that's deferred to determine
exactly which set of FMF are needed.
Differential Revision: https://reviews.llvm.org/D40150
llvm-svn: 318628
Summary:
With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.
Reviewers: junbuml, mcrosier, davidxl
Reviewed By: junbuml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40037
llvm-svn: 318593
We were not doing that for large shadow granularity. Also add more
stack frame layout tests for large shadow granularity.
Differential Revision: https://reviews.llvm.org/D39475
llvm-svn: 318581
Revert the following commits:
r318369 [asan] Fallback to non-ifunc dynamic shadow on android<22.
r318235 [asan] Prevent rematerialization of &__asan_shadow.
r317948 [sanitizer] Remove unnecessary attribute hidden.
r317943 [asan] Use dynamic shadow on 32-bit Android.
MemoryRangeIsAvailable() reads /proc/$PID/maps into an mmap-ed buffer
that may overlap with the address range that we plan to use for the
dynamic shadow mapping. This is causing random startup crashes.
llvm-svn: 318575
Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks.
Reviewers: efriedma, mcrosier, igor-laevsky
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40184
llvm-svn: 318551
making it no longer even remotely simple.
The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.
The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:
1) Fully unswitch a branch or switch instruction from inside of a loop to
outside of it.
2) Update the CFG and IR. This avoids needing to "remember" the
unswitched branches as well as avoiding excessively cloning and
reliance on complex parts of simplify-cfg to cleanup the cfg.
3) Update the analyses (where we can) rather than just blowing them away
or relying on something else updating them.
Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I will need to make
another attempt to do this now that we have a nice dynamic update API
for dominators. However, we do adhere to #3 w.r.t. LoopInfo.
This approach also adds an important principls specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of the total loop
size.
Some remaining issues that I will be addressing in subsequent commits:
- Handling unstructured control flow.
- Unswitching 'switch' cases instead of just branches.
- Moving to the dynamic update API for dominators.
Some high-level, interesting limitationsV that folks might want to push
on as follow-ups but that I don't have any immediate plans around:
- We could be much more clever about not cloning things that will be
deleted. In fact, we should be able to delete *nothing* and do
a minimal number of clones.
- There are many more interesting selection criteria for which branch to
unswitch that we might want to look at. One that I'm interested in
particularly are a set of conditions which all exit the loop and which
can be merged into a single unswitched test of them.
Differential revision: https://reviews.llvm.org/D34200
llvm-svn: 318549
The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER`
into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which
happens when we calculate safe iteration space. Effectively, the result of intersection of
these ranges doesn't differ from the range of merged range check.
We chose to remove duplicating logic in favor of code simplicity.
Differential Revision: https://reviews.llvm.org/D39589
llvm-svn: 318508
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
The requirement is that shadow memory must be aligned to page
boundaries (4k in this case). Use a closed form equation that always
satisfies this requirement.
Differential Revision: https://reviews.llvm.org/D39471
llvm-svn: 318421
// trunc (binop X, C) --> binop (trunc X, C')
// trunc (binop (ext X), Y) --> binop X, (trunc Y)
I'm grouping sub with the other binops because that makes the code simpler
and the transforms are valid:
https://rise4fun.com/Alive/UeF
...so even though we don't expect a sub with constant Op1 or any of the
other opcodes with constant Op0 due to canonicalization rules, we might as
well handle those situations if non-canonical code somehow reaches this
point (it should just make instcombine more efficient in reaching its
end goal).
This should solve the problem that later manifests in the vectorizers in
PR35295:
https://bugs.llvm.org/show_bug.cgi?id=35295
llvm-svn: 318404
Fix a couple places where the minimum alignment/size should be a
function of the shadow granularity:
- alignment of AllGlobals
- the minimum left redzone size on the stack
Added a test to verify that the metadata_array is properly aligned
for shadow scale of 5, to be enabled when we add build support
for testing shadow scale of 5.
Differential Revision: https://reviews.llvm.org/D39470
llvm-svn: 318395
When expanding exit conditions for pre- and postloops, we may end up expanding a
recurrency from the loop to in its loop's preheader. This produces incorrect IR.
This patch ensures that IRCE uses SCEVExpander correctly and only expands code which
is safe to expand in this particular location.
Differentian Revision: https://reviews.llvm.org/D39234
llvm-svn: 318381
Note that one-use and shouldChangeType() are checked ahead of the switch.
Without the narrowing folds, we can produce inferior vector code as shown in PR35299:
https://bugs.llvm.org/show_bug.cgi?id=35299
llvm-svn: 318323
InstCombine salvages debug info for every instruction it erases from its
worklist, but it wasn't doing it during its initial DCE when populating
its worklist. This fixes that.
This should help improve availability of 'this' in optimized debug info
when casts are necessary.
llvm-svn: 318320
Summary:
Added more remarks to SLP pass, in particular "missed" optimization remarks.
Also proposed several tests for new functionality.
Patch by Vladimir Miloserdov!
For reference you may look at: https://reviews.llvm.org/rL302811
Reviewers: anemet, fhahn
Reviewed By: anemet
Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits
Differential Revision: https://reviews.llvm.org/D38367
llvm-svn: 318307
Summary:
This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select.
As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero?
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39999
llvm-svn: 318267
It crashes building sqlite; see reply on the llvm-commits thread.
> [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
>
> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
> *dst++ = *src++;
> *dst++ = *src++ + 1;
> *dst++ = *src++ + 2;
> *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318239
Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened.
PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch.
Fixes PR35210.
Differential Revision: https://reviews.llvm.org/D40035
llvm-svn: 318237
Summary:
In the mode when ASan shadow base is computed as the address of an
external global (__asan_shadow, currently on android/arm32 only),
regalloc prefers to rematerialize this value to save register spills.
Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant
pool entry).
This changes adds an inline asm in the function prologue to suppress
this behavior. It reduces AsanTest binary size by 7%.
Reviewers: pcc, vitalybuka
Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40048
llvm-svn: 318235
Summary:
Instcombine (and probably other passes) sometimes want to change the
type of an alloca. To do this, they generally create a new alloca with
the desired type, create a bitcast to make the new pointer type match
the old pointer type, replace all uses with the cast, and then simplify
the casts. We already knew how to salvage dbg.value instructions when
removing casts, but we can extend it to cover dbg.addr and dbg.declare.
Fixes a debug info quality issue uncovered in Chromium in
http://crbug.com/784609
Reviewers: aprantl, vsk
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40042
llvm-svn: 318203
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318193
This patch is part of D38676.
The patch introduces two new Recipes to handle instructions whose vectorization
involves masking. These Recipes take VPlan-level masks in D38676, but still rely
on ILV's existing createEdgeMask(), createBlockInMask() in this patch.
VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence
of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(),
which now handles only loop-header phi nodes.
VPWidenMemoryInstructionRecipe handles load/store which are to be widened
(but are not part of an Interleave Group). In this patch it simply calls
ILV::vectorizeMemoryInstruction on execute().
Differential Revision: https://reviews.llvm.org/D39068
llvm-svn: 318149
Registers it and everything, updates all the references, etc.
Next patch will add support to Clang's `-fexperimental-new-pass-manager`
path to actually enable BoundsChecking correctly.
Differential Revision: https://reviews.llvm.org/D39084
llvm-svn: 318128
a legacy and new PM pass.
This essentially moves the class state to parameters and re-shuffles the
code to make that reasonable. It also does some minor cleanups along the
way and leaves some comments.
Differential Revision: https://reviews.llvm.org/D39081
llvm-svn: 318124
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
In more recent Linux kernels (including those with 47 bit VMAs) the layout of
virtual memory for powerpc64 changed causing the memory sanitizer to not
work properly. This patch adjusts a bit mask in the memory sanitizer to work
on the newer kernels while continuing to work on the older ones as well.
This is the non-runtime part of the patch and finishes it. ref: r317802
Tested on several 4.x and 3.x kernel releases.
llvm-svn: 318045
Summary:
This patch extends the partial inliner to support inlining parts of
vararg functions, if the vararg handling is done in the outlined part.
It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is
non-null, all varargs passed to the inlined function will be added to
all calls to `ForwardVarArgsTo`.
The partial inliner takes care to only pass `ForwardVarArgsTo` if the
varargs handing is done in the outlined function. It checks that vastart
is not part of the function to be inlined.
`test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part
of the repo) checks we do not do partial inlining if vastart is used in
a basic block that will be inlined.
Reviewers: davide, davidxl, grosser
Reviewed By: davide, davidxl, grosser
Subscribers: gyiu, grosser, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D39607
llvm-svn: 318028
Summary:
This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max.
In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value.
I've chosen a simpler case for the test here.
Reviewers: spatel, davide, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39766
llvm-svn: 317994
Summary:
The following kernel change has moved ET_DYN base to 0x4000000 on arm32:
https://marc.info/?l=linux-kernel&m=149825162606848&w=2
Switch to dynamic shadow base to avoid such conflicts in the future.
Reserve shadow memory in an ifunc resolver, but don't use it in the instrumentation
until PR35221 is fixed. This will eventually let use save one load per function.
Reviewers: kcc
Subscribers: aemerson, srhines, kubamracek, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D39393
llvm-svn: 317943
Summary:
The specification of the @llvm.memcpy.element.unordered.atomic intrinsic requires
that the pointer arguments have alignments of at least the element size. The existing
IRBuilder interface to create a call to this intrinsic does not allow for providing
the alignment of these pointer args. Having an interface that makes it easy to
construct invalid intrinsic calls doesn't seem sensible, so this patch simply
adds the requirement that one provide the argument alignments when using IRBuilder
to create atomic memcpy calls.
llvm-svn: 317918
Summary:
This adds logic to CVP to remove some overflow checks. It uses LVI to remove
operations with at least one constant. Specifically, this can remove many
overflow intrinsics immediately following an overflow check in the source code,
such as:
if (x < INT_MAX)
... x + 1 ...
Patch by Joel Galenson!
Reviewers: sanjoy, regehr
Reviewed By: sanjoy
Subscribers: fhahn, pirama, srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D39483
llvm-svn: 317911
Summary:
This wrapper checks if there is at least one non-zero weight before
setting the metadata.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39872
llvm-svn: 317845
When the Constant Hoisting pass moves expensive constants into a
common block, it would assign a debug location equal to the last use
of that constant. While this is certainly intuitive, it places the
constant in an out-of-order location, according to the debug location
information. This produces out-of-order stepping when debugging
programs affected by this pass.
This patch creates in-order stepping behavior by merging the debug
locations for hoisted constants, and the new insertion point.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D38088
llvm-svn: 317827
Summary:
The analysis of the store sequence goes in straight order - from the
first store to the last. Bu the best opportunity for vectorization will
happen if we're going to use reverse order - from last store to the
first. It may be best because usually users have some initialization
part + further processing and this first initialization may confuse
SLP vectorizer.
Reviewers: RKSimon, hfinkel, mkuper, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39606
llvm-svn: 317821
The toxic stew of created values named 'tmp' and tests that already have
values named 'tmp' and CHECK lines looking for values named 'tmp' causes
bad things to happen in our test line auto-generation scripts because it
wants to use 'TMP' as a prefix for unnamed values. Use less 'tmp' to
avoid that.
llvm-svn: 317818
We must patch all existing incoming values of Phi node,
otherwise it is possible that we can see poison
where program does not expect to see it.
This is the similar what GVN does.
The added test test/Transforms/GVN/PRE/pre-jt-add.ll shows an
example of wrong optimization done by jump threading due to
GVN PRE did not patch existing incoming value.
Reviewers: mkazantsev, wmi, dberlin, davide
Reviewed By: dberlin
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39637
llvm-svn: 317768
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].
Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.
As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.
[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html
Differential Revision: https://reviews.llvm.org/D38336
llvm-svn: 317729
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.
A test will be added to projects/compiler-rt.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D39565
llvm-svn: 317714