This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.
llvm-svn: 320958
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.
llvm-svn: 320929
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.
More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs
into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs
into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node
when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a
variety of ways.
a. For #2, the vector path is missing the case for setlt with a '1' constant.
b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel
produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the
shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift
produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific
decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.
define i32 @abs_shifty(i32 %x) {
%signbit = ashr i32 %x, 31
%add = add i32 %signbit, %x
%abs = xor i32 %signbit, %add
ret i32 %abs
}
define i32 @abs_cmpsubsel(i32 %x) {
%cmp = icmp slt i32 %x, zeroinitializer
%sub = sub i32 zeroinitializer, %x
%abs = select i1 %cmp, i32 %sub, i32 %x
ret i32 %abs
}
define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
%signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%add = add <4 x i32> %signbit, %x
%abs = xor <4 x i32> %signbit, %add
ret <4 x i32> %abs
}
define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
%cmp = icmp slt <4 x i32> %x, zeroinitializer
%sub = sub <4 x i32> zeroinitializer, %x
%abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
ret <4 x i32> %abs
}
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx
> abs_shifty:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_cmpsubsel:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_shifty_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> abs_cmpsubsel_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_cmpsubsel:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_shifty_vec:
> abs v0.4s, v0.4s
> ret
>
> abs_cmpsubsel_vec:
> abs v0.4s, v0.4s
> ret
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le
> abs_shifty:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_cmpsubsel:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_shifty_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
> abs_cmpsubsel_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
Differential Revision: https://reviews.llvm.org/D40984
llvm-svn: 320921
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.
Fixes PR34965.
Patch by Diego Caballero, thanks!
Differential Revision: https://reviews.llvm.org/D40742
llvm-svn: 320913
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:
sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))
Patch by Paul Walker, thanks!
Differential Revision: https://reviews.llvm.org/D40069
llvm-svn: 320901
This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*):
1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h.
2. Many of VP*Recipe::print() and execute() definitions are still left in
LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To
be moved to VPlan.cpp at a later time.
3. InterleaveGroup class is moved from anonymous namespace to llvm namespace.
Referencing it in anonymous namespace from VPlan.h ended up in warning.
Patch by Hideki Saito, thanks!
Differential Revision: https://reviews.llvm.org/D41045
llvm-svn: 320900
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.
Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).
This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40747
llvm-svn: 320895
This recommits r320823 reverted due to the test failure in sink-foldable.ll and
an unused variable. Added "REQUIRES: aarch64-registered-target" in the test
and removed unused variable.
Original commit message:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320858
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.
Differential Review: https://reviews.llvm.org/D41197
llvm-svn: 320848
This recommit r320823 after fixing a test failure.
Original commit message:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320833
Summary:
Continue trying to sink an instruction if its users in the loop is foldable.
This will allow the instruction to be folded in the loop by decoupling it from
the user outside of the loop.
Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier
Reviewed By: hfinkel
Subscribers: javed.absar, bmakam, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37076
llvm-svn: 320823
Summary:
The port is nearly straightforward.
The only complication is related to the analyses handling,
since one of the analyses used in this module pass is domtree,
which is a function analysis. That requires asking for the results
of each function and disallows a single interface for run-on-module
pass action.
Decided to copy-paste the main body of this pass.
Most of its code is requesting analyses anyway, so not that much
of a copy-paste.
The rest of the code movement is to transform all the implementation
helper functions like stripNonValidData into non-member statics.
Extended all the related LLVM tests with new-pass-manager use.
No failures.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Subscribers: skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D41162
llvm-svn: 320796
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Differential Revision: https://reviews.llvm.org/D38566
llvm-svn: 320749
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted.
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.
Differential Revision: https://reviews.llvm.org/D41139
llvm-svn: 320736
Summary:
Passing AliasAnalysis results instead of nullptr appears to work just fine.
A couple new-pass-manager tests updated to align with new order of analyses.
Reviewers: chandlerc, spatel, craig.topper
Reviewed By: chandlerc
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41203
llvm-svn: 320687
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
This is the last step in addressing PR30654.
llvm-svn: 320672
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 320612
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)
Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.
Differential Revision: https://reviews.llvm.org/D39781
llvm-svn: 320607
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mgrang, dcaballe, hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 320548
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.
If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.
We use the recorded conditions to create the new call sites and split
the basic block.
This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.
Reviewers: davidxl, junbuml
Reviewed By: junbuml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40728
llvm-svn: 320547
Summary: This brings CPU overhead on bzip2 down from 5.5x to 2x.
Reviewers: kcc, alekseyshl
Subscribers: kubamracek, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41137
llvm-svn: 320538
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320525
This algorithm (explained more in the source code) takes into account
global redundancies by building a "pair map" to find common subexprs.
The primary motivation of this is to handle situations like
foo = (a * b) * c
bar = (a * d) * c
where we currently don't identify that "a * c" is redundant.
Accordingly, it prioritizes the emission of a * c so that CSE
can remove the redundant calculation later.
Does not change the actual reassociation algorithm -- only the
order in which the reassociated operand chain is reconstructed.
Gives ~1.5% floating point math instruction count reduction on
a large offline suite of graphics shaders.
llvm-svn: 320515
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.
To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)
Reviewers: davidxl, xur
Reviewed By: davidxl
Subscribers: efriedma, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D40699
llvm-svn: 320511
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320510
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320499
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320496
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320488
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320483
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
1. load off a null
2. load off a GEP with null base
3. store to a null
This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.
Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).
Reviewers: majnemer, hfinkel, sanjoy, davide
Reviewed by: sanjoy, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41026
llvm-svn: 320480
VecValuesToIgnore holds values that will not appear in the vectorized loop.
We should therefore ignore their cost when VF > 1.
Differential Revision: https://reviews.llvm.org/D40883
llvm-svn: 320463
Summary:
This solves PR35616.
We don't want the compiler to generate different code when we compile
with/without -g, so we now ignore debug intrinsics when determining if
the optimization can trigger or not.
Reviewers: junbuml
Subscribers: davide, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41068
llvm-svn: 320460
The tests fail (opt asserts) on Windows.
> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320421
The function stack poisioner conditionally stores local variables
either in an alloca or in malloc'ated memory, which has the
unfortunate side-effect, that the actual address of the variable is
only materialized when the variable is accessed, which means that
those variables are mostly invisible to the debugger even when
compiling without optimizations.
This patch stores the address of the local stack base into an alloca,
which can be referred to by the debug info and is available throughout
the function. This adds one extra pointer-sized alloca to each stack
frame (but mem2reg can optimize it away again when optimizations are
enabled, yielding roughly the same debug info quality as before in
optimized code).
rdar://problem/30433661
Differential Revision: https://reviews.llvm.org/D41034
llvm-svn: 320415
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320407
This patch introduces getShadowOriginPtr(), a method that obtains both the shadow and origin pointers for an address as a Value pair.
The existing callers of getShadowPtr() and getOriginPtr() are updated to use getShadowOriginPtr().
The rationale for this change is to simplify KMSAN instrumentation implementation.
In KMSAN origins tracking is always enabled, and there's no direct mapping between the app memory and the shadow/origin pages.
Both the shadow and the origin pointer for a given address are obtained by calling a single runtime hook from the instrumentation,
therefore it's easier to work with those pointers together.
Reviewed at https://reviews.llvm.org/D40835.
llvm-svn: 320373
Summary:
Reuse the Linux new mapping as it is.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, eugenis, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D41022
llvm-svn: 320219
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
Summary:
If a partially inlined function has debug info, we have to add debug
locations to the call instruction calling the outlined function.
We use the debug location of the first instruction in the outlined
function, as the introduced call transfers control to this statement and
there is no other equivalent line in the source code.
We also use the same debug location for the branch instruction added
to jump from artificial entry block for the outlined function, which just
jumps to the first actual basic block of the outlined function.
Reviewers: davide, aprantl, rriddle, dblaikie, danielcdh, wmi
Reviewed By: aprantl, rriddle, danielcdh
Subscribers: eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D40413
llvm-svn: 320199
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.
Abandon this patch. There is a more general solution
which also handles runtime infinite loop (but not statically).
llvm-svn: 320180
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4
```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40304
llvm-svn: 320157
In more recent Linux kernels with 47 bit VMAs the layout of virtual memory
for powerpc64 changed causing the address sanitizer to not work properly. This
patch adds support for 47 bit VMA kernels for powerpc64 and fixes up test
cases.
https://reviews.llvm.org/D40907
There is an associated patch for compiler-rt.
Tested on several 4.x and 3.x kernel releases.
llvm-svn: 320109
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
As a new access is generated spanning across multiple fields, we need to
propagate alias info from all the fields to form the most generic alias info.
rdar://35602528
Differential Revision: https://reviews.llvm.org/D40617
llvm-svn: 319979
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Differential Revision: https://reviews.llvm.org/D40658
llvm-svn: 319963
Summary:
There is no need to replace the original call instruction if no
VarArgs need to be forwarded.
Reviewers: davide, rnk, majnemer, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40412
llvm-svn: 319947
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
llvm-svn: 319873
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.
Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.
Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel
Subscribers: nlopes, llvm-commits
Differential Revision: https://reviews.llvm.org/D40749
llvm-svn: 319821
This uses ConstantRange::makeGuaranteedNoWrapRegion's newly-added handling for subtraction to allow CVP to remove some subtraction overflow checks.
Differential Revision: https://reviews.llvm.org/D40039
llvm-svn: 319807
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Patch by JesperAntonsson.
Reviewers: spatel, eeckstein, hans
Reviewed By: hans
Subscribers: uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D40639
llvm-svn: 319768
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40750
llvm-svn: 319689
(This reapplies r314253. r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 319677
Summary:
Currently, we only support predication for forward loops with step
of 1. This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
1. The step of the IV is -1.
2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).
This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.
Reviewers: apilipenko, mkazantsev, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40353
llvm-svn: 319659
Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant. In this case, there is no loop invariant value contributing and we'd fail an assert.
The test case was found by a java fuzzer and reduced. It's a real cornercase. You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate. To my knowledge, only the fuzzer has hit this case.
llvm-svn: 319583
It causes builds to fail with "Instruction does not dominate all uses" (PR35497).
> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
> *dst++ = *src++;
> *dst++ = *src++ + 1;
> *dst++ = *src++ + 2;
> *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 319550
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
Patch by: JesperAntonsson
Reviewers: spatel, eeckstein, hans
Reviewed By: hans
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40639
llvm-svn: 319537
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 319531
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.
This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.
Differential Revision: https://reviews.llvm.org/D40593
llvm-svn: 319494
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
Fixes PR28958.
Differential Revision: https://reviews.llvm.org/D38374
llvm-svn: 319482
Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.
Here is a simplified motivating example.
struct record {
long long a;
int b;
int c;
};
int func(struct record r) {
for (int i = 0; i < r.c; i++)
r.b++;
return r.b;
}
When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.
With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.
Differential Revision: https://reviews.llvm.org/D32998
llvm-svn: 319407
Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning).
Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers.
Differential Revision: https://reviews.llvm.org/D38190
llvm-svn: 319398
An alloca may be larger than a variable that is described to be stored
there. Don't create a dbg.value for fragments that are outside of the
variable.
This fixes PR35447.
https://bugs.llvm.org/show_bug.cgi?id=35447
llvm-svn: 319230
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).
Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.
There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.
Review: Björn Petterson
https://reviews.llvm.org/D40339
llvm-svn: 319173
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand
As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.
This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.
The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.
There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.
One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.
I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).
I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.
Differential revision: https://reviews.llvm.org/D37467
llvm-svn: 319164
Summary:
I think we do not need to analyze debug intrinsics here, as they should
not impact codegen. This has 2 benefits: 1) slightly less work to do and
2) avoiding generating optimization remarks for converting calls to
debug intrinsics to tail calls, which are not really helpful for users.
Based on work by Sander de Smalen.
Reviewers: davide, trentxintong, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D40440
llvm-svn: 319158
This is to address a problem similar to those in D37460 for Scalar PRE. We should not
PRE across an instruction that may not pass execution to its successor unless it is safe
to speculatively execute it.
Differential Revision: https://reviews.llvm.org/D38619
llvm-svn: 319147
Revert "[SROA] Propagate !range metadata when moving loads."
Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI."
Davide says they broke a bot.
llvm-svn: 319131
This tries to propagate !range metadata to a pre-existing load
when a load is optimized out. This is done instead of adding an
assume because converting loads to and from assumes creates a
lot of IR.
Patch by Ariel Ben-Yehuda.
Differential Revision: https://reviews.llvm.org/D37216
llvm-svn: 319096
enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
TCK_NoTail = 3 };
TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.
rdar://35639547
llvm-svn: 319075
MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.
This patch moves the address check in front of the shadow access.
llvm-svn: 318901
In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to
help SCEV just in case if it fails to figure them out on its own.
Differential Revision: https://reviews.llvm.org/D40168
llvm-svn: 318898
After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.
e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)
`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.
Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.
The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.
Thanks to Zhendong Su for the report!
Fixes PR35357.
llvm-svn: 318817
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.
New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.
Reviewers: sanjoy, davide, gberry
Subscribers: mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D40274
llvm-svn: 318772
properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++.
Instead, I introduced RPOT. In most cases, it works for CSE.
llvm-svn: 318743
Summary:
Add the following heuristics for irreducible loop metadata:
- When an irreducible loop header is missing the loop header weight metadata,
give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
likely to become irreducible loop headers after indirectbr tail duplication.)
These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39980
llvm-svn: 318693
Summary:
SROA can fail in rewriting alloca but still rewrite a phi resulting
in dead instruction elimination. The Changed flag was not being set
correctly, resulting in downstream passes using stale analyses.
The included test case will assert during the second BDCE pass as a
result.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39921
llvm-svn: 318677
Summary:
This change reverts r318575 and changes FindDynamicShadowStart() to
keep the memory range it found mapped PROT_NONE to make sure it is
not reused. We also skip MemoryRangeIsAvailable() check, because it
is (a) unnecessary, and (b) would fail anyway.
Reviewers: pcc, vitalybuka, kcc
Subscribers: srhines, kubamracek, mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40203
llvm-svn: 318666
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.
The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.
Differential Revision: https://reviews.llvm.org/D38676
llvm-svn: 318645
In rL316552, we ban intersection of unsigned latch range with signed range check and vice
versa, unless the entire range check iteration space is known positive. It was a correct
functional fix that saved us from dealing with ambiguous values, but it also appeared
to be a very restrictive limitation. In particular, in the following case:
loop:
%iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch]
%iv.offset = add i32 %iv, 10
%rc = icmp slt i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
Here, the unsigned iteration range is `[0, 100)`, and the safe range for range
check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned
min/max functions for range intersection. Given this, we wanted to avoid dealing
with `-10` because it is interpreted as a very big unsigned value. Semantically, range
check's safe range goes through unsigned border, so in fact it is two disjoint
ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited
this case saying that we are not allowed to intersect such ranges.
What semantics of this safe range actually means is that we can start from `-10` and go
up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that
`%len - 10` is a reasonably big positive value).
In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return
safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All
values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous.
In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the
original safe space which is represented by one continuous range that does not go through wrap.
In order to reach this, we use modified SCEV substraction function. It can be imagined as a function
that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration
space. This allows us to perform IRCE in many situations when we deal with IV space and range check
of different types (in terms of signed/unsigned).
We apply this approach for both matching and not matching types of IV iteration space and the
range check. One implication of this is that now IRCE became smarter in detection of empty safe
ranges. For example, in this case:
loop:
%iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch]
%iv.offset = sub i32 %iv, 10
%rc = icmp ult i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`,
we could end up executing entire loop in safe preloop while the main loop was still generated,
but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow,
we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization.
Differential Revision: https://reviews.llvm.org/D39954
llvm-svn: 318639
As the first test shows, we could transform an llvm intrinsic which never sets errno
into a libcall which could set errno (even though it's marked readnone?), so that's
not ideal.
It's possible that we can also transform a libcall which could set errno to an
intrinsic given the fast-math-flags constraint, but that's deferred to determine
exactly which set of FMF are needed.
Differential Revision: https://reviews.llvm.org/D40150
llvm-svn: 318628
Summary:
With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.
Reviewers: junbuml, mcrosier, davidxl
Reviewed By: junbuml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40037
llvm-svn: 318593
We were not doing that for large shadow granularity. Also add more
stack frame layout tests for large shadow granularity.
Differential Revision: https://reviews.llvm.org/D39475
llvm-svn: 318581
Revert the following commits:
r318369 [asan] Fallback to non-ifunc dynamic shadow on android<22.
r318235 [asan] Prevent rematerialization of &__asan_shadow.
r317948 [sanitizer] Remove unnecessary attribute hidden.
r317943 [asan] Use dynamic shadow on 32-bit Android.
MemoryRangeIsAvailable() reads /proc/$PID/maps into an mmap-ed buffer
that may overlap with the address range that we plan to use for the
dynamic shadow mapping. This is causing random startup crashes.
llvm-svn: 318575
Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks.
Reviewers: efriedma, mcrosier, igor-laevsky
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40184
llvm-svn: 318551
making it no longer even remotely simple.
The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.
The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:
1) Fully unswitch a branch or switch instruction from inside of a loop to
outside of it.
2) Update the CFG and IR. This avoids needing to "remember" the
unswitched branches as well as avoiding excessively cloning and
reliance on complex parts of simplify-cfg to cleanup the cfg.
3) Update the analyses (where we can) rather than just blowing them away
or relying on something else updating them.
Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I will need to make
another attempt to do this now that we have a nice dynamic update API
for dominators. However, we do adhere to #3 w.r.t. LoopInfo.
This approach also adds an important principls specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of the total loop
size.
Some remaining issues that I will be addressing in subsequent commits:
- Handling unstructured control flow.
- Unswitching 'switch' cases instead of just branches.
- Moving to the dynamic update API for dominators.
Some high-level, interesting limitationsV that folks might want to push
on as follow-ups but that I don't have any immediate plans around:
- We could be much more clever about not cloning things that will be
deleted. In fact, we should be able to delete *nothing* and do
a minimal number of clones.
- There are many more interesting selection criteria for which branch to
unswitch that we might want to look at. One that I'm interested in
particularly are a set of conditions which all exit the loop and which
can be merged into a single unswitched test of them.
Differential revision: https://reviews.llvm.org/D34200
llvm-svn: 318549
The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER`
into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which
happens when we calculate safe iteration space. Effectively, the result of intersection of
these ranges doesn't differ from the range of merged range check.
We chose to remove duplicating logic in favor of code simplicity.
Differential Revision: https://reviews.llvm.org/D39589
llvm-svn: 318508
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
The requirement is that shadow memory must be aligned to page
boundaries (4k in this case). Use a closed form equation that always
satisfies this requirement.
Differential Revision: https://reviews.llvm.org/D39471
llvm-svn: 318421
// trunc (binop X, C) --> binop (trunc X, C')
// trunc (binop (ext X), Y) --> binop X, (trunc Y)
I'm grouping sub with the other binops because that makes the code simpler
and the transforms are valid:
https://rise4fun.com/Alive/UeF
...so even though we don't expect a sub with constant Op1 or any of the
other opcodes with constant Op0 due to canonicalization rules, we might as
well handle those situations if non-canonical code somehow reaches this
point (it should just make instcombine more efficient in reaching its
end goal).
This should solve the problem that later manifests in the vectorizers in
PR35295:
https://bugs.llvm.org/show_bug.cgi?id=35295
llvm-svn: 318404
Fix a couple places where the minimum alignment/size should be a
function of the shadow granularity:
- alignment of AllGlobals
- the minimum left redzone size on the stack
Added a test to verify that the metadata_array is properly aligned
for shadow scale of 5, to be enabled when we add build support
for testing shadow scale of 5.
Differential Revision: https://reviews.llvm.org/D39470
llvm-svn: 318395
When expanding exit conditions for pre- and postloops, we may end up expanding a
recurrency from the loop to in its loop's preheader. This produces incorrect IR.
This patch ensures that IRCE uses SCEVExpander correctly and only expands code which
is safe to expand in this particular location.
Differentian Revision: https://reviews.llvm.org/D39234
llvm-svn: 318381
Note that one-use and shouldChangeType() are checked ahead of the switch.
Without the narrowing folds, we can produce inferior vector code as shown in PR35299:
https://bugs.llvm.org/show_bug.cgi?id=35299
llvm-svn: 318323
InstCombine salvages debug info for every instruction it erases from its
worklist, but it wasn't doing it during its initial DCE when populating
its worklist. This fixes that.
This should help improve availability of 'this' in optimized debug info
when casts are necessary.
llvm-svn: 318320
Summary:
Added more remarks to SLP pass, in particular "missed" optimization remarks.
Also proposed several tests for new functionality.
Patch by Vladimir Miloserdov!
For reference you may look at: https://reviews.llvm.org/rL302811
Reviewers: anemet, fhahn
Reviewed By: anemet
Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits
Differential Revision: https://reviews.llvm.org/D38367
llvm-svn: 318307
Summary:
This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select.
As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero?
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39999
llvm-svn: 318267
It crashes building sqlite; see reply on the llvm-commits thread.
> [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
>
> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
> *dst++ = *src++;
> *dst++ = *src++ + 1;
> *dst++ = *src++ + 2;
> *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318239
Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened.
PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch.
Fixes PR35210.
Differential Revision: https://reviews.llvm.org/D40035
llvm-svn: 318237
Summary:
In the mode when ASan shadow base is computed as the address of an
external global (__asan_shadow, currently on android/arm32 only),
regalloc prefers to rematerialize this value to save register spills.
Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant
pool entry).
This changes adds an inline asm in the function prologue to suppress
this behavior. It reduces AsanTest binary size by 7%.
Reviewers: pcc, vitalybuka
Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40048
llvm-svn: 318235
Summary:
Instcombine (and probably other passes) sometimes want to change the
type of an alloca. To do this, they generally create a new alloca with
the desired type, create a bitcast to make the new pointer type match
the old pointer type, replace all uses with the cast, and then simplify
the casts. We already knew how to salvage dbg.value instructions when
removing casts, but we can extend it to cover dbg.addr and dbg.declare.
Fixes a debug info quality issue uncovered in Chromium in
http://crbug.com/784609
Reviewers: aprantl, vsk
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40042
llvm-svn: 318203
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318193
This patch is part of D38676.
The patch introduces two new Recipes to handle instructions whose vectorization
involves masking. These Recipes take VPlan-level masks in D38676, but still rely
on ILV's existing createEdgeMask(), createBlockInMask() in this patch.
VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence
of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(),
which now handles only loop-header phi nodes.
VPWidenMemoryInstructionRecipe handles load/store which are to be widened
(but are not part of an Interleave Group). In this patch it simply calls
ILV::vectorizeMemoryInstruction on execute().
Differential Revision: https://reviews.llvm.org/D39068
llvm-svn: 318149
Registers it and everything, updates all the references, etc.
Next patch will add support to Clang's `-fexperimental-new-pass-manager`
path to actually enable BoundsChecking correctly.
Differential Revision: https://reviews.llvm.org/D39084
llvm-svn: 318128
a legacy and new PM pass.
This essentially moves the class state to parameters and re-shuffles the
code to make that reasonable. It also does some minor cleanups along the
way and leaves some comments.
Differential Revision: https://reviews.llvm.org/D39081
llvm-svn: 318124
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
In more recent Linux kernels (including those with 47 bit VMAs) the layout of
virtual memory for powerpc64 changed causing the memory sanitizer to not
work properly. This patch adjusts a bit mask in the memory sanitizer to work
on the newer kernels while continuing to work on the older ones as well.
This is the non-runtime part of the patch and finishes it. ref: r317802
Tested on several 4.x and 3.x kernel releases.
llvm-svn: 318045
Summary:
This patch extends the partial inliner to support inlining parts of
vararg functions, if the vararg handling is done in the outlined part.
It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is
non-null, all varargs passed to the inlined function will be added to
all calls to `ForwardVarArgsTo`.
The partial inliner takes care to only pass `ForwardVarArgsTo` if the
varargs handing is done in the outlined function. It checks that vastart
is not part of the function to be inlined.
`test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part
of the repo) checks we do not do partial inlining if vastart is used in
a basic block that will be inlined.
Reviewers: davide, davidxl, grosser
Reviewed By: davide, davidxl, grosser
Subscribers: gyiu, grosser, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D39607
llvm-svn: 318028
Summary:
This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max.
In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value.
I've chosen a simpler case for the test here.
Reviewers: spatel, davide, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39766
llvm-svn: 317994
Summary:
The following kernel change has moved ET_DYN base to 0x4000000 on arm32:
https://marc.info/?l=linux-kernel&m=149825162606848&w=2
Switch to dynamic shadow base to avoid such conflicts in the future.
Reserve shadow memory in an ifunc resolver, but don't use it in the instrumentation
until PR35221 is fixed. This will eventually let use save one load per function.
Reviewers: kcc
Subscribers: aemerson, srhines, kubamracek, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D39393
llvm-svn: 317943
Summary:
The specification of the @llvm.memcpy.element.unordered.atomic intrinsic requires
that the pointer arguments have alignments of at least the element size. The existing
IRBuilder interface to create a call to this intrinsic does not allow for providing
the alignment of these pointer args. Having an interface that makes it easy to
construct invalid intrinsic calls doesn't seem sensible, so this patch simply
adds the requirement that one provide the argument alignments when using IRBuilder
to create atomic memcpy calls.
llvm-svn: 317918
Summary:
This adds logic to CVP to remove some overflow checks. It uses LVI to remove
operations with at least one constant. Specifically, this can remove many
overflow intrinsics immediately following an overflow check in the source code,
such as:
if (x < INT_MAX)
... x + 1 ...
Patch by Joel Galenson!
Reviewers: sanjoy, regehr
Reviewed By: sanjoy
Subscribers: fhahn, pirama, srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D39483
llvm-svn: 317911
Summary:
This wrapper checks if there is at least one non-zero weight before
setting the metadata.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39872
llvm-svn: 317845
When the Constant Hoisting pass moves expensive constants into a
common block, it would assign a debug location equal to the last use
of that constant. While this is certainly intuitive, it places the
constant in an out-of-order location, according to the debug location
information. This produces out-of-order stepping when debugging
programs affected by this pass.
This patch creates in-order stepping behavior by merging the debug
locations for hoisted constants, and the new insertion point.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D38088
llvm-svn: 317827
Summary:
The analysis of the store sequence goes in straight order - from the
first store to the last. Bu the best opportunity for vectorization will
happen if we're going to use reverse order - from last store to the
first. It may be best because usually users have some initialization
part + further processing and this first initialization may confuse
SLP vectorizer.
Reviewers: RKSimon, hfinkel, mkuper, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39606
llvm-svn: 317821
The toxic stew of created values named 'tmp' and tests that already have
values named 'tmp' and CHECK lines looking for values named 'tmp' causes
bad things to happen in our test line auto-generation scripts because it
wants to use 'TMP' as a prefix for unnamed values. Use less 'tmp' to
avoid that.
llvm-svn: 317818
We must patch all existing incoming values of Phi node,
otherwise it is possible that we can see poison
where program does not expect to see it.
This is the similar what GVN does.
The added test test/Transforms/GVN/PRE/pre-jt-add.ll shows an
example of wrong optimization done by jump threading due to
GVN PRE did not patch existing incoming value.
Reviewers: mkazantsev, wmi, dberlin, davide
Reviewed By: dberlin
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39637
llvm-svn: 317768
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].
Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.
As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.
[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html
Differential Revision: https://reviews.llvm.org/D38336
llvm-svn: 317729
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.
A test will be added to projects/compiler-rt.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D39565
llvm-svn: 317714
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed PR34619 and other issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 317618
The hexagon test should be fixed now.
Original commit message:
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
This can allow us to get the select closer to other selects to enable removing one.
Differential Revision: https://reviews.llvm.org/D39222
llvm-svn: 317600
Blockaddresses refer to the function itself, therefore replacing them
would cause an assertion in doRAUW.
Fixes https://bugs.llvm.org/show_bug.cgi?id=35201
This was found when trying CFI on a proprietary kernel by Dmitry Mikulin.
Differential Revision: https://reviews.llvm.org/D39695
llvm-svn: 317527
This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots.
> This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
>
> This can allow us to get the select closer to other selects to enable removing one.
>
> Differential Revision: https://reviews.llvm.org/D39222
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-svn: 317518
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
This can allow us to get the select closer to other selects to enable removing one.
Differential Revision: https://reviews.llvm.org/D39222
llvm-svn: 317510
Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38763
llvm-svn: 317502
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow
commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
This preserves the debug info for the cast operation in the original location.
rdar://problem/33460652
Reapplied r317340 with the test moved into an ARM-specific directory.
llvm-svn: 317375
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374
Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold.
The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold.
Differential Revision: https://reviews.llvm.org/D39570
llvm-svn: 317368
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide
Reviewed By: davidxl
Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D39137
llvm-svn: 317351
Summary:
The current LICM allows sinking an instruction only when it is exposed to exit
blocks through a trivially replacable PHI of which all incoming values are the
same instruction. This change enhance LICM to sink a sinkable instruction
through non-trivially replacable PHIs by spliting predecessors of loop
exits.
Reviewers: hfinkel, majnemer, davidxl, bmakam, mcrosier, danielcdh, efriedma, jtony
Reviewed By: efriedma
Subscribers: nemanjai, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D37163
llvm-svn: 317335
Summary:
Refactored the code to separate out common functions that are being
reused.
This is to reduce the changes for changes coming up wrt loop
predication with reverse loops.
This refactoring is what we have in our downstream code.
llvm-svn: 317324
Summary:
Also added a reserve() method to MapVector since we want to use that from
ADCE.
DenseMap does not provide deterministic iteration order so with that
we will handle the members of BlockInfo in random order, eventually
leading to random order of the blocks in the predecessor lists.
Without this change, I get the same predecessor order in about 90% of the
time when I compile a certain reproducer and in 10% I get a different one.
No idea how to make a proper test case for this.
Reviewers: kuhar, david2050
Reviewed By: kuhar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39593
llvm-svn: 317323
Summary:
InlineFunction can fail, for example when trying to inline vararg
fuctions. In those cases, we do not want to bump partial inlining
counters or set AnyInlined to true, because this could leave an unused
function hanging around.
Reviewers: davidxl, davide, gyiu
Reviewed By: davide
Subscribers: llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39581
llvm-svn: 317314
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.
The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.
This patch is a basic support for this.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39028
llvm-svn: 317278
Summary:
This patch allows us to predicate range checks that have a type narrower than
the latch check type. We leverage SCEV analysis to identify a truncate for the
latchLimit and latchStart.
There is also safety checks in place which requires the start and limit to be
known at compile time. We require this to make sure that the SCEV truncate expr
for the IV corresponding to the latch does not cause us to lose information
about the IV range.
Added tests show the loop predication over range checks that are of various
types and are narrower than the latch type.
This enhancement has been in our downstream tree for a while.
Reviewers: apilipenko, sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39500
llvm-svn: 317269
The original change was reverted in rL317217 because of the failure in
the RS4GC testcase. I couldn't reproduce the failure on my local machine
(macbook) but could reproduce it on a linux box.
The failure was around removing the uses of invariant.start. The fix
here is to just RAUW undef (which was the first implementation in D39388).
This is perfectly valid IR as discussed in the review.
llvm-svn: 317225
Summary:
Invariant.start on memory locations has the property that the memory
location is unchanging. However, this is not true in the face of
rewriting statepoints for GC.
Teach RS4GC about removing invariant.start so that optimizations after
RS4GC does not incorrect sink a load from the memory location past a
statepoint.
Added test showcasing the issue.
Reviewers: reames, apilipenko, dneilson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39388
llvm-svn: 317215
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage
This reverts commit eea333c33fa73ad225ef28607795984829f65688.
llvm-svn: 317213
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.
See the discussion in PR33325 for more details.
Reviewers: spatel
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D39456
llvm-svn: 317211
Summary:
SpeculativelyExecuteBB can flatten the CFG by doing
speculative execution followed by a select instruction.
When the speculatively executed BB contained dbg intrinsics
the result could be a little bit weird, since those dbg
intrinsics were inserted before the select in the flattened
CFG. So when single stepping in the debugger, printing the
value of the variable referenced in the dbg intrinsic, it
could happen that it looked like the variable had values
that never actually were assigned to the variable.
This patch simply discards all dbg intrinsics that were found
in the speculatively executed BB.
Reviewers: aprantl, chandlerc, craig.topper
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39494
llvm-svn: 317198
Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D39480
llvm-svn: 317118
This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.
This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).
This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.
Fixes PR35142.
Differential Revision: https://reviews.llvm.org/D39484
llvm-svn: 317108
This patch reverts rL311205 that was initially a wrong fix. The real problem
was in intersection of signed and unsigned ranges (see rL316552), and the
patch being reverted masked the problem instead of fixing it.
By now, the test against which rL311205 was made works OK even without this
code. This revert patch also contains a test case that demonstrates incorrect
behavior caused by rL311205: it is caused by incorrect choise of signed max
instead of unsigned.
llvm-svn: 317088
Summary:
By replacing branches to CommonExitBlock, we remove the node from
CommonExitBlock's predecessors, invalidating the iterator. The problem
is exposed when the common exit block has multiple predecessors and
needs to sink lifetime info. The modification in the test case trigger
the issue.
Reviewers: davidxl, davide, wmi
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39112
llvm-svn: 317084
This formulation might be slightly slower since I eagerly compute the cheap replacements. If anyone sees this having a compile time impact, let me know and I'll use lazy population instead.
llvm-svn: 317048
Currently the selects are created with the names of their inputs concatenated together. It's possible to get cases that chain these selects together resulting in long names due to multiple levels of concatenation. Our internal branch of llvm managed to generate names over 100000 characters in length on a particular test due to an extreme compounding of the names.
This patch changes the name to a generic name that is not dependent on its inputs.
Differential Revision: https://reviews.llvm.org/D39440
llvm-svn: 317024
If a select instruction tests the returned flag of a cmpxchg instruction and
selects between the returned value of the cmpxchg instruction and its compare
operand, the result of the select will always be equal to its false value.
Differential Revision: https://reviews.llvm.org/D39383
llvm-svn: 316994
The optimisation remarks for loop unrolling with an unrolled remainder looks something like:
test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
C[i] += A[i*N+j];
^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
for(int j = 0; j < N; j++)
^
This removes the first of the two messages.
Differential revision: https://reviews.llvm.org/D38725
llvm-svn: 316986
Rename `Offset`, `Scale`, `Length` into `Begin`, `Step`, `End` respectively
to make naming of similar entities for Ranges and Range Checks more
consistent.
Differential Revision: https://reviews.llvm.org/D39414
llvm-svn: 316979
As noted in the nice block comment, the previous code didn't actually handle multi-entry loops correctly, it just assumed SCEV didn't analyze such loops. Given SCEV has comments to the contrary, that seems a bit suspect. More importantly, the pass actually requires loopsimplify form which ensures a loop-preheader is available. Remove the excessive generaility and shorten the code greatly.
Note that we do successfully analyze many multi-entry loops, but we do so by converting them to single entry loops. See the added test case.
llvm-svn: 316976
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Differential Revision: https://reviews.llvm.org/D37460
llvm-svn: 316975
Previously, the code returned early from the *function* when it couldn't find a free expansion, it should be returning from the *transform*. I don't have a test case, noticed this via inspection.
As a follow up, I'm going to revisit the logic in the extract function. I think that essentially the whole helper routine can be replaced with SCEVExpander, but I wanted to do that in a series of separate commits.
llvm-svn: 316974
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
InferAddressSpaces assumes the pointee type of addrspacecast
is the same as the operand, which is not always true and causes
invalid IR.
This bug cause build failure in HCC.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39432
llvm-svn: 316957
It's not guaranteed. There's a bug open to sort them in predecessor
order, but it won't happen anytime soon. In the meanwhile, passes
will have to do an O(#preds) scan. Such is life.
llvm-svn: 316953
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html
This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:
IntrinsicInst
-> MemIntrinsicBase (methods-only class)
-> MemIntrinsic (non-atomic intrinsics)
-> MemSetInst
-> MemTransferInst
-> MemCpyInst
-> MemMoveInst
-> AtomicMemIntrinsic (atomic intrinsics)
-> AtomicMemSetInst
-> AtomicMemTransferInst
-> AtomicMemCpyInst
-> AtomicMemMoveInst
-> AnyMemIntrinsic (both atomicities)
-> AnyMemSetInst
-> AnyMemTransferInst
-> AnyMemCpyInst
-> AnyMemMoveInst
This involves some class renaming:
ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.
An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).
---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"
REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"
FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )
SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"
for f in $FILES; do
echo "Processing: $f"
sed -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done
Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev
Reviewed By: sanjoy
Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38419
llvm-svn: 316950
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316891
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316887
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
Summary:
We shouldn't do this transformation if the function is marked nobuitlin.
We were only checking that the return type is floating point, we really should be checking the argument types and argument count as well. This can be accomplished by using the other version of getLibFunc that takes the Function and not just the name.
We should also be checking TLI::has since sqrtf is a macro on Windows.
Fixes PR32559.
Reviewers: hfinkel, spatel, davide, efriedma
Reviewed By: davide, efriedma
Subscribers: efriedma, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39381
llvm-svn: 316819
This is a follow up change for D37569.
Currently the transformation is limited to the case when:
* The loop has a single latch with the condition of the form: ++i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is of the form i u< guardLimit.
This patch enables the transform in the case when the latch is
latchStart + i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
And the guard is
guardStart + i u< guardLimit
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D39097
llvm-svn: 316768
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.
Reviewers: dblaikie, aprantl
Reviewed By: aprantl
Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D39343
llvm-svn: 316703
When going to explain this to someone else, I got tripped up by the complicated meaning of IsKnownNonEscapingObject in load-store promotion. Extract a helper routine and clarify naming/scopes to make this a bit more obvious.
llvm-svn: 316699
Summary:
We no longer add vectors of pointers as candidates for
load/store vectorization. It does not seem to work anyway,
but without this patch we can end up in asserts when trying
to create casts between an integer type and the pointer of
vectors type.
The test case I've added used to assert like this when trying to
cast between i64 and <2 x i16*>:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 Vectorizer::vectorizeStoreChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D39296
llvm-svn: 316665
Summary:
The code comments indicate that no effort has been spent on
handling load/stores when the size isn't a multiple of the
byte size correctly. However, the code only avoided types
smaller than 8 bits. So for example a load of an i28 could
still be considered as a candidate for vectorization.
This patch adjusts the code to behave according to the code
comment.
The test case used to hit the following assert when
trying to use "cast" an i32 to i28 using CreateBitOrPointerCast:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 (anonymous namespace)::Vectorizer::vectorizeLoadChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39295
llvm-svn: 316663
Summary: For some irreducible CFG the domtree nodes might be dead, do not update domtree for dead nodes.
Reviewers: kuhar, dberlin, hfinkel
Reviewed By: kuhar
Subscribers: llvm-commits, mcrosier
Differential Revision: https://reviews.llvm.org/D38960
llvm-svn: 316582
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
IRCE for unsigned latch conditions was temporarily disabled by rL314881. The motivating
example contained an unsigned latch condition and a signed range check. One of the safe
iteration ranges was `[1, SINT_MAX + 1]`. Its right border was incorrectly interpreted as a negative
value in `IntersectRange` function, this lead to a miscompile under which we deleted a range check
without inserting a postloop where it was needed.
This patch brings back IRCE for unsigned latch conditions. Now we treat range intersection more
carefully. If the latch condition was unsigned, we only try to consider a range check for deletion if:
1. The range check is also unsigned, or
2. Safe iteration range of the range check lies within `[0, SINT_MAX]`.
The same is done for signed latch.
Values from `[0, SINT_MAX]` are unambiguous, these values are non-negative under any interpretation,
and all values of a range intersected with such range are also non-negative.
We also use signed/unsigned min/max functions for range intersection depending on type of the
latch condition.
Differential Revision: https://reviews.llvm.org/D38581
llvm-svn: 316552
For a SCEV range, this patch replaces the naive emptiness check for SCEV ranges
which looks like `Begin == End` with a SCEV check. The range is guaranteed to be
empty of `Begin >= End`. We should filter such ranges out and do not try to perform
IRCE for them.
For example, we can get such range when intersecting range `[A, B)` and `[C, D)`
where `A < B < C < D`. The resulting range is `[max(A, C), min(B, D)) = [C, B)`.
This range is empty, but its `Begin` does not match with `End`.
Making IRCE for an empty range is basically safe but unprofitable because we
never actually get into the main loop where the range checks are supposed to
be eliminated. This patch uses SCEV mechanisms to treat loops with proved
`Begin >= End` as empty.
Differential Revision: https://reviews.llvm.org/D39082
llvm-svn: 316550
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
The `BasicBlock::getFirstInsertionPt` call may return `std::end` for the
BB. Dereferencing the end iterator results in an assertion failure
"(!NodePtr->isKnownSentinel()), function operator*". Ensure that the
returned iterator is valid before dereferencing it. If the end is
returned, move one position backward to get a valid insertion point.
llvm-svn: 316401
Summary:
The elts of ActivePreds which is defined as a SmallPtrSet are copied
into Blocks using std::copy. This makes the resultant order of Blocks
non-deterministic. We cannot simply sort Blocks as they need to match
the corresponding Values. So a better approach is to define ActivePreds
as SmallSetVector.
This fixes the following failures in
http://lab.llvm.org:8011/builders/reverse-iteration:
LLVM :: Transforms/GVNSink/indirect-call.ll
LLVM :: Transforms/GVNSink/sink-common-code.ll
LLVM :: Transforms/GVNSink/struct.ll
Reviewers: dberlin, jmolloy, bkramer, efriedma
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39025
llvm-svn: 316369
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:
int switcher(int x) {
switch(x) {
case 17: return 17;
case 19: return 19;
case 42: return 42;
default: break;
}
return 0;
}
int comparator(int x) {
if (x == 17) return 17;
if (x == 19) return 19;
if (x == 42) return 42;
return 0;
}
For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw
Differential Revision: https://reviews.llvm.org/D39011
llvm-svn: 316293
The way that splitInnerLoopHeader splits blocks requires that
the induction PHI will be the first PHI in the inner loop
header. This makes sure that is actually the case when there
are both IV and reduction phis.
Differential Revision: https://reviews.llvm.org/D38682
llvm-svn: 316261
MergeFunctions uses (through FunctionComparator) a map of GlobalValues
to identifiers because it needs to compare functions and globals
do not have an inherent total order. Thus, FunctionComparator
(through GlobalNumberState) has a ValueMap<GlobalValue *>.
r315852 added a RAUW on globals that may have been previously
encountered by the FunctionComparator, which would replace
a GlobalValue * key with a ConstantExpr *, which is illegal.
This commit adjusts that code path to remove the function being
replaced from the ValueMap as well.
llvm-svn: 316145
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 316054
Summary:
std::unordered_multimap happens to be very slow when the number of elements
grows large. On one of our internal applications we observed a 17x compile time
improvement from changing it to DenseMap.
Reviewers: mehdi_amini, serge-sans-paille, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38916
llvm-svn: 316045
This patch reverts rL315440 because of the bug described at
https://bugs.llvm.org/show_bug.cgi?id=34937
The fix for the bug is on review as D38944, but not yet ready. Given this is a regression reverting until a fix is ready is called for.
Max would have done the revert himself, but is having trouble doing a build of fresh LLVM for some reason. I did the build and test to ensure the revert worked as expected on his behalf.
llvm-svn: 315974
above PHI instructions.
ARC optimizer has an optimization that moves a call to an ObjC runtime
function above a phi instruction when the phi has a null operand and is
an argument passed to the function call. This optimization should not
kick in when the runtime function is an objc_release that releases an
object with precise lifetime semantics.
rdar://problem/34959669
llvm-svn: 315914
Summary:
The following transformation for cmp instruction:
icmp smin(x, PositiveValue), 0 -> icmp x, 0
should only be done after checking for min/max to prevent infinite
looping caused by a reverse canonicalization. That is why this
transformation was moved to place after the mentioned check.
Reviewers: spatel, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38934
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 315895
This can result in significant code size savings in some cases,
e.g. an interrupt table all filled with the same assembly stub
in a certain Cortex-M BSP results in code blowup by a factor of 2.5.
Differential Revision: https://reviews.llvm.org/D34806
llvm-svn: 315853
This reduces code size for constructs like vtables or interrupt
tables that refer to functions in global initializers.
Differential Revision: https://reviews.llvm.org/D34805
llvm-svn: 315852
This avoid code duplication and allow us to add the disable unroll metadata elsewhere.
Differential Revision: https://reviews.llvm.org/D38928
llvm-svn: 315850
It is possible for both a base and a derived class to be satisfied
with a unique vtable. If a program contains casts of the same pointer
to both of those types, the CFI checks will be lowered to this
(with ThinLTO):
if (p != &__typeid_base_global_addr)
trap();
if (p != &__typeid_derived_global_addr)
trap();
The optimizer may then use the first condition combined
with the assumption that __typeid_base_global_addr and
__typeid_derived_global_addr may not alias to optimize away the second
comparison, resulting in an unconditional trap.
This patch fixes the bug by giving imported globals the type [0 x i8]*,
which prevents the optimizer from assuming that they do not alias.
Differential Revision: https://reviews.llvm.org/D38873
llvm-svn: 315753
This patch moves some common utility functions out of IPSCCP and makes them
available globally. The functions determine if interprocedural data-flow
analyses can propagate information through function returns, arguments, and
global variables.
Differential Revision: https://reviews.llvm.org/D37638
llvm-svn: 315719
Summary:
In RS4GC it is possible that a base pointer is contained in a vector that
has undergone a bitcast from one element-pointertype to another. We teach
RS4GC how to look through bitcasts of vector types when looking for a base
pointer.
Reviewers: anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38849
llvm-svn: 315694
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)
I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.
llvm-svn: 315680
Summary:
This patch adds processing of binary operations when the def of operands are in
the same block (i.e. local processing).
Earlier we bailed out in such cases (the bail out was introduced in rL252032)
because LVI at that time was more precise about context at the end of basic
blocks, which implied local def and use analysis didn't benefit CVP.
Since then we've added support for LVI in presence of assumes and guards. The
test cases added show how local def processing in CVP helps adding more
information to the ashr, sdiv, srem and add operators.
Note: processCmp which suffers from the same problem will
be handled in a later patch.
Reviewers: philip, apilipenko, SjoerdMeijer, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38766
llvm-svn: 315634
This is a follow up for the loop predication change 313981 to support ule, sle latch predicates.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D38177
llvm-svn: 315616
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
This reverts commit 4e4ee1c507e2707bb3c208e1e1b6551c3015cbf5.
This is failing due to some code that isn't built on MSVC
so I didn't catch. Not immediately obvious how to fix this
at first glance, so I'm reverting for now.
llvm-svn: 315536
There's a lot of misuse of Twine scattered around LLVM. This
ranges in severity from benign (returning a Twine from a function
by value that is just a string literal) to pretty sketchy (storing
a Twine by value in a class). While there are some uses for
copying Twines, most of the very compelling ones are confined
to the Twine class implementation itself, and other uses are
either dubious or easily worked around.
This patch makes Twine's copy constructor private, and fixes up
all callsites.
Differential Revision: https://reviews.llvm.org/D38767
llvm-svn: 315530
parameterized emit() calls
Summary: This is not functional change to adopt new emit() API added in r313691.
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38285
llvm-svn: 315476
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Differential Revision: https://reviews.llvm.org/D37460
llvm-svn: 315440
Sinking of unordered atomic load into loop must be disallowed because it turns
a single load into multiple loads. The relevant section of the documentation
is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for
Optimizers section. Here is the full text of this section:
> Notes for optimizers
> In terms of the optimizer, this **prohibits any transformation that
> transforms a single load into multiple loads**, transforms a store into
> multiple stores, narrows a store, or stores a value which would not be
> stored otherwise. Some examples of unsafe optimizations are narrowing
> an assignment into a bitfield, rematerializing a load, and turning loads
> and stores into a memcpy call. Reordering unordered operations is safe,
> though, and optimizers should take advantage of that because unordered
> operations are common in languages that need them.
Patch by Daniil Suchkov!
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D38392
llvm-svn: 315438
IRCE should not apply when the safe iteration range is proved to be empty.
In this case we do unneeded job creating pre/post loops and then never
go to the main loop.
This patch makes IRCE not apply to empty safe ranges, adds test for this
situation and also modifies one of existing tests where it used to happen
slightly.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D38577
llvm-svn: 315437
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D38478
llvm-svn: 315369
Eliminate inttype phi with inttoptr/ptrtoint.
This version fixed a bug in finding the matching
phi -- the order of the incoming blocks may be
different (triggered in self build on Windows).
A new test case is added.
llvm-svn: 315272
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.
llvm-svn: 315127
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870
Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:
C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
_Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
I am attempting to reproduce this now.
This reverts r315044
llvm-svn: 315108
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38603
llvm-svn: 315077
This is a vestige from the GCC-3 days, which disables IPO passes
when set. I don't think anybody actually uses it as there are
several IPO passes which still run with this flag set and
nobody complained/noticed. This reduces the delta between
current and new pass manager and allows us to easily review
the difference when we decide to flip the switch (or audit
which passes should run, FWIW).
llvm-svn: 315043
Summary:
If the extracted region has multiple exported data flows toward the same BB which is not included in the region, correct resotre instructions and PHI nodes won't be generated inside the exitStub. The solution is simply put the restore instructions right after the definition of output values instead of putting in exitStub.
Unittest for this bug is included.
Author: myhsu
Reviewers: chandlerc, davide, lattner, silvas, davidxl, wmi, kuhar
Subscribers: dberlin, kuhar, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37902
llvm-svn: 315041
It is possible for two modules to define the same set of external
symbols without causing a duplicate symbol error at link time,
as long as each of the symbols is a comdat member. So we cannot
use them as part of a unique id for the module.
Differential Revision: https://reviews.llvm.org/D38602
llvm-svn: 315026
Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38477
llvm-svn: 315016
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
When ignoring a load that participates in an interleaved group, make sure to
move a cast that needs to sink after it.
Testcase derived from reproducer of PR34743.
Differential Revision: https://reviews.llvm.org/D38338
llvm-svn: 314986
Instead of trying to keep LastWidenRecipe updated after creating each recipe,
have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check
if it's a VPWidenRecipe when attempting to extend its range. This ensures that
such extensions, optimized to maintain the original instruction order, do so
only when the instructions are to maintain their relative order. The latter does
not always hold, e.g., when a cast needs to sink to unravel first order
recurrence (r306884).
Testcase derived from reproducer of PR34711.
Differential Revision: https://reviews.llvm.org/D38339
llvm-svn: 314981
We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.
This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.
Fixes PR34841.
llvm-svn: 314971
Summary: This is to avoid e.g. merging two cheap icmps if the target is not going to expand to something nice later.
Reviewers: dberlin, spatel
Subscribers: davide, nemanjai
Differential Revision: https://reviews.llvm.org/D38232
llvm-svn: 314970
We can support ashr similar to lshr, if we know that none of the shifted in bits are used. In that case SimplifyDemandedBits would normally convert it to lshr. But that conversion doesn't happen if the shift has additional users.
Differential Revision: https://reviews.llvm.org/D38521
llvm-svn: 314945
This is a follow-up to https://reviews.llvm.org/D38138.
I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping
any options by using default param values.
llvm-svn: 314930
Recommitting r314517 with the fix for handling ConstantExpr.
Original commit message:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
mode in the target. However, since it doesn't check its actual users, it will
return FREE even in cases where the GEP cannot be folded away as a part of
actual addressing mode. For example, if an user of the GEP is a call
instruction taking the GEP as a parameter, then the GEP may not be folded in
isel.
llvm-svn: 314923
We have found some corner cases connected to range intersection where IRCE makes
a bad thing when the latch condition is unsigned. The fix for that will go as a follow up.
This patch temporarily disables IRCE for unsigned latch conditions until the issue is fixed.
The unsigned latch conditions were introduced to IRCE by rL310027.
Differential Revision: https://reviews.llvm.org/D38529
llvm-svn: 314881
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314824
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314806
Apparently this works by virtue of the fact that the pointers are pointers to the APInts stored inside of the ConstantInt objects. But I really don't think we should be relying on that.
llvm-svn: 314761
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38440
llvm-svn: 314640
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
llvm-svn: 314622
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Differential revision: http://reviews.llvm.org/D37832
llvm-svn: 314561
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Reference: https://bugs.llvm.org/show_bug.cgi?id=34687
llvm-svn: 314542
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.
llvm-svn: 314360
reductions.
If both operands of the newly created SelectInst are Undefs the
resulting operation is also Undef, not SelectInst. It may cause crashes
when trying to propagate IR flags because function expects exactly
SelectInst instruction, nothing else.
llvm-svn: 314323
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.
Patch by Michael Berg <michael_c_berg@apple.com>!
Differential Revision: https://reviews.llvm.org/D38263
llvm-svn: 314320
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
This patch tries to transform cases like:
for (unsigned i = 0; i < N; i += 2) {
bool c0 = (i & 0x1) == 0;
bool c1 = ((i + 1) & 0x1) == 1;
}
To
for (unsigned i = 0; i < N; i += 2) {
bool c0 = true;
bool c1 = true;
}
This commit also update test/Transforms/IndVarSimplify/replace-srem-by-urem.ll to prevent constant folding.
Differential Revision: https://reviews.llvm.org/D38272
llvm-svn: 314266
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 314253
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.
This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.
Differential Revision: https://reviews.llvm.org/D38206
llvm-svn: 314233
This is a 2nd attempt at:
https://reviews.llvm.org/rL310055
...which was reverted at rL310123 because of PR34074:
https://bugs.llvm.org/show_bug.cgi?id=34074
In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the
earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store.
Original commit message (authored by Filipe Cabecinhas):
This fixes PR31777.
If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.
In the future we should also support vectors of integers. And maybe
float/double if we can.
Differential Revision: https://reviews.llvm.org/D30703
llvm-svn: 314206
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
Summary:
Sanitizer blacklist entries currently apply to all sanitizers--there
is no way to specify that an entry should only apply to a specific
sanitizer. This is important for Control Flow Integrity since there are
several different CFI modes that can be enabled at once. For maximum
security, CFI blacklist entries should be scoped to only the specific
CFI mode(s) that entry applies to.
Adding section headers to SpecialCaseLists allows users to specify more
information about list entries, like sanitizer names or other metadata,
like so:
[section1]
fun:*fun1*
[section2|section3]
fun:*fun23*
The section headers are regular expressions. For backwards compatbility,
blacklist entries entered before a section header are put into the '[*]'
section so that blacklists without sections retain the same behavior.
SpecialCaseList has been modified to also accept a section name when
matching against the blacklist. It has also been modified so the
follow-up change to clang can define a derived class that allows
matching sections by SectionMask instead of by string.
Reviewers: pcc, kcc, eugenis, vsk
Reviewed By: eugenis, vsk
Subscribers: vitalybuka, llvm-commits
Differential Revision: https://reviews.llvm.org/D37924
llvm-svn: 314170
All this optimization cares about is knowing how many low bits of LHS is known to be zero and whether that means that the result is 0 or greater than the RHS constant. It doesn't matter where the zeros in the low bits came from. So we don't need to specifically look for an AND. Instead we can use known bits.
Differential Revision: https://reviews.llvm.org/D38195
llvm-svn: 314153
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118
because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again.
Original commit message:
The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314147
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314117
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.
llvm-svn: 314018
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.
For now the transformation is limited to the following case:
* The loop has a single latch with either ult or slt icmp condition.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is ult.
See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.
Reviewed By: sanjoy, mkazantsev
Differential Revision: https://reviews.llvm.org/D37569
llvm-svn: 313981
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
llvm-svn: 313905
.. as well as the two subsequent changes r313826 and r313875.
This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).
llvm-svn: 313876
Summary:
There already was code that tried to remove the dbg.declare, but that code
was placed after we had called
I->replaceAllUsesWith(UndefValue::get(I->getType()));
on the alloca, so when we searched for the relevant dbg.declare, we
couldn't find it.
Now we do the search before we call RAUW so there is a chance to find it.
An existing testcase needed update due to this. Two dbg.declare with undef
were removed and then suddenly one of the two CHECKS failed.
Before this patch we got
call void @llvm.dbg.declare(metadata i24* undef, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
call void @llvm.dbg.declare(metadata %struct.prog_src_register* undef, metadata !14, metadata !DIExpression()), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
and with it we get
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
However, the CHECKs in the testcase checked things in a silly order, so
they only passed since they found things in the first dbg.declare. Now
we changed the order of the checks and the test passes.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37900
llvm-svn: 313875
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.
Patch by: Nikola Prica.
llvm-svn: 313870
Revert the patch causing the functional failures.
The patch owner is notified with test cases which fail.
Test case has been provided to Maxim offline.
llvm-svn: 313857
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG')
into hiding, but I'll try to clean that up with a follow-up patch anyway.
llvm-svn: 313829
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html
This is tracked as PR34136
llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.
The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.
Reviewers: aprantl, dblaikie, probinson
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37768
llvm-svn: 313825
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)
Differential Revision: https://reviews.llvm.org/D38065
llvm-svn: 313819
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313781
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.
Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019
llvm-svn: 313774
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Subscribers: mzolotukhin
Reviewed By: ayal
Differential Revision: https://reviews.llvm.org/D36130
Review comments updated accordingly
Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
Added a TODO for sortLoadAccesses API
Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
Modified the TODO for sortLoadAccesses API
Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
Review comment update for using OpdNum to insert the mask in respective location
Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.
Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D38086
llvm-svn: 313766
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
Commit after rebase for patch D36130
Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
Summary:
With this change:
- Methods in LoopBase trip an assert if the receiver has been invalidated
- LoopBase::clear frees up the memory held the LoopBase instance
This change also shuffles things around as necessary to work with this stricter invariant.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38055
llvm-svn: 313708
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator. This requires making the insertion
operator generic.
I've also converted a few cases to use the new API. It seems to work pretty
well. See the LoopUnroller for a slightly more interesting case.
llvm-svn: 313691
Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36637
llvm-svn: 313678
Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38018
llvm-svn: 313658
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.
llvm-svn: 313450
CostModel.
The original patch added support for horizontal min/max reductions to
the SLP vectorizer.
This patch causes LLVM to miscompile fairly simple signed min
reductions. I have attached a test progrom to http://llvm.org/PR34635
that shows the behavior change after this patch. We found this in a test
for the open source Eigen library, but also in other code.
Unfortunately, the revert is moderately challenging. It required
reverting:
r313042: [SLP] Test with multiple uses of conditional op and wrong parent.
r312853: [SLP] Fix buildbots, NFC.
r312793: [SLP] Fix the warning about paths not returning the value, NFC.
r312791: [SLP] Support for horizontal min/max reduction.
And even then, I had to completely skip reverting the changes to TTI and
CostModel because r312832 rewrote so much of this code. Plus, the cost
modeling changes aren implicated in the miscompile, so they should be
fine and will just not be used until this gets re-introduced.
llvm-svn: 313409
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled.
However LLVM-C API users can still provide callback function for diagnostic handler.
llvm-svn: 313390
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled.
However LLVM-C API users can still provide callback function for diagnostic handler.
llvm-svn: 313382
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor. Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.
Reviewers: apilipenko, reames, evstupac, mkuper
Subscribers: llvm-commits
Reviewed by: reames
Differential Revision: https://reviews.llvm.org/D35380
llvm-svn: 313363
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.
llvm-svn: 313358
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 313348
Summary: Move to LoopUtils method that collects all children of a node inside a loop.
Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37870
llvm-svn: 313322
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: vitalybuka, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313277
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.
Differential Revision: https://reviews.llvm.org/D37507
llvm-svn: 313237
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.
Differential Revision: https://reviews.llvm.org/D37842
llvm-svn: 313229
This broke Chromium's CFI build; see crbug.com/765004.
> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313222
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313195
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.
Suggested during review for D37702.
llvm-svn: 313176
Summary:
Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text.
This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37776
llvm-svn: 313159
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.
Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.
Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313157
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.
This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37783
llvm-svn: 313151
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.
No functional changes intended nor expected other than surviving null Cond[0]'s.
This fix follows D35725, which introduced using null to represent full masks.
Differential Revision: https://reviews.llvm.org/D37619
llvm-svn: 313119
Factor out the reachability such that multiple queries to find reachability of values are fast. This is based on finding
the ANTIC points
in the CFG which do not change during hoisting. The ANTIC points are basically the dominance-frontiers in the inverse
graph. So we introduce a data structure (CHI nodes)
to keep track of values flowing out of a basic block. We only do this for values with multiple occurrences in the
function as they are the potential hoistable candidates.
This patch allows us to hoist instructions to a basic block with >2 successors, as well as deal with infinite loops in a
trivial way.
Relevant test cases are added to show the functionality as well as regression fixes from PR32821.
Regression from previous GVNHoist:
We do not hoist fully redundant expressions because fully redundant expressions are already handled by NewGVN
Differential Revision: https://reviews.llvm.org/D35918
Reviewers: dberlin, sebpop, gberry,
llvm-svn: 313116
Summary:
This should improve optimized debug info for address-taken variables at
the cost of inaccurate debug info in some situations.
We patched this into clang and deployed this change to Chromium
developers, and this significantly improved debuggability of optimized
code. The long-term solution to PR34136 seems more and more like it's
going to take a while, so I would like to commit this change under a
flag so that it can be used as a stop-gap measure.
This flag should really help so for C++ aggregates like std::string and
std::vector, which are typically address-taken, even after inlining, and
cannot be SROA-ed.
Reviewers: aprantl, dblaikie, probinson, dberlin
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D36596
llvm-svn: 313108
Summary: This change passes down ACT to SampleProfileLoader for the new PM. Also remove the default value for SampleProfileLoader class as it is not used.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37773
llvm-svn: 313080
Summary:
The current promoteLoopAccessesToScalars method receives an AliasSet, but
the information used is in fact a list of Value*, known to must alias.
Create the list ahead of time to make this method independent of the AliasSet class.
While there is no functionality change, this adds overhead for creating
a set of Value*, when promotion would normally exit earlier.
This is meant to be as a first refactoring step in order to start replacing
AliasSetTracker with MemorySSA.
And while the end goal is to redesign LICM, the first few steps will focus on
adding MemorySSA as an alternative to the AliasSetTracker using most of the
existing functionality.
Reviewers: mkuper, danielcdh, dberlin
Subscribers: sanjoy, chandlerc, gberry, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35439
llvm-svn: 313075
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.
Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.
This is a fix on top of rL312472.
Reviewers: Ayal, zvi, hfinkel, dneilson
Reviewed by: Ayal
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37702
llvm-svn: 313046
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.
This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.
Differential Revision: https://reviews.llvm.org/D37407
llvm-svn: 312967
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
Differential Revision: https://reviews.llvm.org/D37628
llvm-svn: 312869
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.
Differential revision: https://reviews.llvm.org/D27846
llvm-svn: 312791
This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.
Differential Revision: https://reviews.llvm.org/D37550
llvm-svn: 312767
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318
Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490
llvm-svn: 312758
Consider this type of a loop:
for (...) {
...
if (...) continue;
...
}
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.
Differential Revision: https://reviews.llvm.org/D36404
llvm-svn: 312664
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145
In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.
But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.
By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.
Differential Revision: https://reviews.llvm.org/D37427
llvm-svn: 312591
Instead of creating a Constant and then calling m_APInt with it (which will always return true). Just create an APInt initially, and use that for the checks in isSelect01 function. If it turns out we do need the Constant, create it from the APInt.
This is a refactor for a future patch that will do some more checks of the constant values here.
llvm-svn: 312517
Summary:
Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.
Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:
1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
2. Compute MaxVF returned 16 on a target supporting AVX512.
3. OptForSize -> choose VF:=MaxVF.
4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.
With this patch step 2. will choose MaxVF=8 based on TripCount.
Reviewers: Ayal, dorit, mkuper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D37425
llvm-svn: 312472
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.
In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.
Differential Revision: https://reviews.llvm.org/D37338
llvm-svn: 312471
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
llvm-svn: 312420
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.
llvm-svn: 312418
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical
folds for both logic ops because those are DeMorganized variants.
llvm-svn: 312415
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.
Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).
Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway. Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc. In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.
This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.
This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve. It also requires a
bunch of caching that i'll eventually like to eliminate.
The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.
Reviewers: davide, mcrosier
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37175
llvm-svn: 312401
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.
This allows some code to be removed from InstSimplify that was doing this functionality.
This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.
There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.
Differential Revision: https://reviews.llvm.org/D37158
llvm-svn: 312382
A future patch will make the code look through truncates feeding the compare. So the compares might be different types but the pretruncated types might be the same.
This should be safe because we still require the same Value* to be used truncated or not in both compares. So that serves to ensure the types are the same.
llvm-svn: 312381
Previously we used the type from the LHS of the compare, but a future patch will change decomposeBitTestICmp to look through truncates so it will return a pretruncated Value* and the type needs to match that.
llvm-svn: 312380
Summary: When we backtranslate expressions, we can't use the predicateinfo, since we are evaluating them in a different context.
Reviewers: davide, mcrosier
Subscribers: sanjoy, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D37174
llvm-svn: 312352
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
Reviewers: mkuper, Ayal, dlj, rengolin, srhines
Reviewed By: rengolin
Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35498
llvm-svn: 312331
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing
dwarf expression which returns either constant initial
value or other value.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D35994
llvm-svn: 312318
comparisons into memcmp.
Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().
For now this is disabled by default until:
- https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
- Benchmarks show that this is always useful.
Differential Revision:
https://reviews.llvm.org/D33987
llvm-svn: 312315
The BasicBlock passed to FindPredecessorRetainWithSafePath should be the
parent block of Autorelease. This fixes a crash that occurs in
FindDependencies when StartInst is not in StartBB.
rdar://problem/33866381
llvm-svn: 312266
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.
As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.
Differential Revision: https://reviews.llvm.org/D37236
llvm-svn: 312248
Current implementation of parseLoopStructure interprets the latch comparison as a
comarison against `iv.next`. If the actual comparison is made against the `iv` current value
then the loop may be rejected, because this misinterpretation leads to incorrect evaluation
of the latch start value.
This patch teaches the IRCE to distinguish this kind of loops and perform the optimization
for them. Now we use `IndVarBase` variable which can be either next or current value of the
induction variable (previously we used `IndVarNext` which was always the value on next iteration).
Differential Revision: https://reviews.llvm.org/D36215
llvm-svn: 312221
Renaming as a preparation step to generalizing IRCE for comparison not only against
the next value of an indvar, but also against the current.
Differential Revision: https://reviews.llvm.org/D36509
llvm-svn: 312215
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
shufflevector, so we really can't see this pattern.
llvm-svn: 312123
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Reviewers: grosser, mkuper, fpetrogalli, efriedma
Reviewed By: fpetrogalli
Subscribers: gareevroman, llvm-commits
Differential Revision: https://reviews.llvm.org/D37064
llvm-svn: 312110
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.
This change only covers FullLTO, and not ThinLTO.
Reviewers: pcc
Subscribers: aemerson, mehdi_amini, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37171
llvm-svn: 312054
Summary:
Cross-DSO CFI needs all __cfi_check exports to use the same encoding
(ARM vs Thumb).
Reviewers: pcc
Subscribers: aemerson, srhines, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37243
llvm-svn: 312052
This is to fix PR34257. rL309059 takes an early return when FindLIVLoopCondition
fails to find a loop invariant condition. This is wrong and it will disable loop
unswitch for select. The patch fixes the bug.
Differential Revision: https://reviews.llvm.org/D36985
llvm-svn: 312045
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36841
llvm-svn: 312030
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.
One test didn't vectorize optimize properly due to another bug so a TODO has been added.
Differential Revision: https://reviews.llvm.org/D37253
llvm-svn: 312023
Summary:
Remove some code that was no longer needed. The first FIXME is
stale since we long ago started using the index to drive importing,
rather than doing force importing based on linkage type. And
now with r309278, we no longer import any aliases.
Reviewers: dblaikie
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D37266
llvm-svn: 312019
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.
Reviewers: tejohnson, xur, davidxl, djasper
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37252
llvm-svn: 312005
As suggested in D37121, here's a wrapper for removeFromParent() + insertAfter(),
but implemented using moveBefore() for symmetry/efficiency.
Differential Revision: https://reviews.llvm.org/D37239
llvm-svn: 312001
When LSR processes code like
int accumulator = 0;
for (int i = 0; i < N; i++) {
accummulator += i;
use((double) accummulator);
}
It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts. The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.
This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.
Differential Revision: https://reviews.llvm.org/D37209
llvm-svn: 311986
This was pretty close to working already. While I was here I went ahead and passed the ICmpInst pointer from the caller instead of doing a dyn_cast that can never fail.
Differential Revision: https://reviews.llvm.org/D37237
llvm-svn: 311960
In r311742 we marked the PCs array as used so it wouldn't be dead
stripped, but left the guard and 8-bit counters arrays alone since
these are referenced by the coverage instrumentation. This doesn't
quite work if we want the indices of the PCs array to match the other
arrays though, since elements can still end up being dead and
disappear.
Instead, we mark all three of these arrays as used so that they'll be
consistent with one another.
llvm-svn: 311959
Be more consistent with CreateFunctionLocalArrayInSection in the API
of CreatePCArray, and assign the member variable in the caller like we
do for the guard and 8-bit counter arrays.
This also tweaks the order of method declarations to match the order
of definitions in the file.
llvm-svn: 311955
We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.
Differential Revision: https://reviews.llvm.org/D37232
llvm-svn: 311940
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.
Differential Revision: https://reviews.llvm.org/D37153
llvm-svn: 311922
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.
This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.
I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.
Reviewers: xur, davidxl, danielcdh
Reviewed By: xur
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37176
llvm-svn: 311906
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.
This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.
llvm-svn: 311849
Prior to this change (and after r311371), we computed it
unconditionally, causin gsevere compile time regressions (in some
cases, 5 to 10x).
llvm-svn: 311804
Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing.
The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags.
Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS.
My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS.
Differential Revision: https://reviews.llvm.org/D36486
llvm-svn: 311789
Summary:
SimplifyIndVar may introduce zext instructions to widen arguments of the
loop exit check. They should not prevent us from splitting the loop at
the induction variable, but maybe the check should be more conservative,
e.g. making sure it only extends arguments used by a comparison?
Reviewers: karthikthecool, mcrosier, mzolotukhin
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34879
llvm-svn: 311783
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.
Differential Revision: https://reviews.llvm.org/D36936
llvm-svn: 311773
Summary:
Add musttail to any resume instructions that is immediately followed by a
suspend (i.e. ret). We do this even in -O0 to support guaranteed tail call
for symmetrical coroutine control transfer (C++ Coroutines TS extension).
This transformation is done only in the resume part of the coroutine that has
identical signature and calling convention as the coro.resume call.
Reviewers: GorNishanov
Reviewed By: GorNishanov
Subscribers: EricWF, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D37125
llvm-svn: 311751
There are 3 small independent changes here:
1. Account for multiple uses in the pattern matching: avoid the transform if it increases the instruction count.
2. Add a missing fold for the case where the numerator is the constant: http://rise4fun.com/Alive/E2p
3. Enable all folds for vector types.
There's still one more potential change - use "shouldChangeType()" to keep from transforming to an illegal integer type.
Differential Revision: https://reviews.llvm.org/D36988
llvm-svn: 311726
Summary:
When reassociating an expression, do not drop the instruction's
original debug location in case the replacement location is
missing.
The debug location must at least not be dropped for inlinable
callsites of debug-info-bearing functions in debug-info-bearing
functions. Failing to do so would result in an "inlinable function "
"call in a function with debug info must have a !dbg location"
error in the verifier.
As preserving the original debug location is not expected
to result in overly jumpy debug line information, it is
preserved for all other cases too.
This fixes PR34231:
https://bugs.llvm.org/show_bug.cgi?id=34231
Original patch by David Stenberg
Reviewers: davide, craig.topper, mcrosier, dblaikie, aprantl
Reviewed By: davide, aprantl
Subscribers: aprantl
Differential Revision: https://reviews.llvm.org/D36865
llvm-svn: 311642
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.
Differential Revision: https://reviews.llvm.org/D37074
llvm-svn: 311604
The lowering isn't really an optimization, so optnone shouldn't make a
difference. ARM relies on the pass running when using "-mthread-model
single", because in that mode, it doesn't run AtomicExpand. See bug for
more details.
Differential Revision: https://reviews.llvm.org/D37040
llvm-svn: 311565
Summary:
If a coroutine outer calls another coroutine inner and the inner coroutine body is inlined into the outer, coro.begin from the inner coroutine should be considered for spilling if accessed across suspends.
Prior to this change, coroutine frame building code was not considering any coro.begins for spilling.
With this change, we only ignore coro.begin for the current coroutine, but, any coro.begins that were inlined into the current coroutine are eligible for spills.
Fixes PR34267
Reviewers: GorNishanov
Subscribers: qcolombet, llvm-commits, EricWF
Differential Revision: https://reviews.llvm.org/D37062
llvm-svn: 311556
..if the resulting subtract will be broken up later. This can cause us to get
into an infinite loop.
x + (-5.0 * y) -> x - (5.0 * y) ; Canonicalize neg const
x - (5.0 * y) -> x + (0 - (5.0 * y)) ; Break up subtract
x + (0 - (5.0 * y)) -> x + (-5.0 * y) ; Replace 0-X with X*-1.
PR34078
llvm-svn: 311554
InstCombine folds instructions with irrelevant conditions to undef.
This, as Nuno confirmed is a bug.
(see https://bugs.llvm.org/show_bug.cgi?id=33409#c1 )
Given the original motivation for the change is that of removing an
USE, we now fold to false instead (which reaches the same goal
without undesired side effects).
Fixes PR33409.
Differential Revision: https://reviews.llvm.org/D36975
llvm-svn: 311540
Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway.
For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'.
With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok.
Differential Revision: https://reviews.llvm.org/D36213
llvm-svn: 311508
We can't reuse the llvm.assume instruction's bitcast because it may not
dominate every user of the vtable pointer.
Differential Revision: https://reviews.llvm.org/D36994
llvm-svn: 311491
Summary:
Use the initialexec TLS type and eliminate calls to the TLS
wrapper. Fixes the sanitizer-x86_64-linux-fuzzer bot failure.
Reviewers: vitalybuka, kcc
Reviewed By: kcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37026
llvm-svn: 311490
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
This is reapplies the original patch r311057 that was reverted in r311381.
The previous version wasn't using the batch update api for updating dominators,
which in vary rare cases caused assertion failures.
This also fixes PR34258.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311467
I don't think there's any reason to have them scattered about and on all 4 operands. We already have an early check that both compares must be the same type. And within a given compare the LHS and RHS must have the same type. Beyond that I don't think there's anyway this function returns anything valid for pointer types. So let's just return early and be done with it.
Differential Revision: https://reviews.llvm.org/D36561
llvm-svn: 311383
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.
I took over this work from Chad Rosier (mcrosier@codeaurora.org).
Differential Revision: https://reviews.llvm.org/D35850
llvm-svn: 311371
The 1st try was reverted because it could inf-loop by creating a dead instruction.
Fixed that to not happen and added a test case to verify.
Original commit message:
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311366
This is similar to what was already done in foldSelectICmpAndOr. Ultimately I'd like to see if we can call foldSelectICmpAnd from foldSelectIntoOp if we detect a power of 2 constant. This would allow us to remove foldSelectICmpAndOr entirely.
Differential Revision: https://reviews.llvm.org/D36498
llvm-svn: 311362
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311349
Summary:
If the bitsToClear from the LHS of an 'and' comes back non-zero, but all of those bits are known zero on the RHS, we can reset bitsToClear.
Without this, the 'or' in the modified test case blocks the transform because it has non-zero bits in its RHS in those bits.
Reviewers: spatel, majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36944
llvm-svn: 311343
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311333
Added a separate metadata to indicate when the loop
has already been vectorized instead of setting width and count to 1.
Patch written by Divya Shanmughan and Aditya Kumar
Differential Revision: https://reviews.llvm.org/D36220
llvm-svn: 311281
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311273
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.
Reviewers: davidxl
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D36919
llvm-svn: 311254
a function into itself.
We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.
This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.
The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.
Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]
llvm-svn: 311229
Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations.
In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned
predicates, but we did not check this for `S` which can in theory be negative.
This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative,
and it adds a test where such situation may lead to incorrect conditions calculation.
Differential Revision: https://reviews.llvm.org/D36873
llvm-svn: 311205
Summary:
Memcpy intrinsics have size argument of any integer type, like i32 or i64.
Fixed size type along with its value when cloning the intrinsic.
Reviewers: davidxl, xur
Reviewed By: davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36844
llvm-svn: 311188
Summary:
Augment SanitizerCoverage to insert maximum stack depth tracing for
use by libFuzzer. The new instrumentation is enabled by the flag
-fsanitize-coverage=stack-depth and is compatible with the existing
trace-pc-guard coverage. The user must also declare the following
global variable in their code:
thread_local uintptr_t __sancov_lowest_stack
https://bugs.llvm.org/show_bug.cgi?id=33857
Reviewers: vitalybuka, kcc
Reviewed By: vitalybuka
Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D36839
llvm-svn: 311186
Summary: This patch teaches LoopRotate to use the new incremental API to update the DominatorTree.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin, davide
Subscribers: hiraditya, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35581
llvm-svn: 311125
Summary:
This patch makes LoopUnswitch use new incremental API for updating dominators.
It also updates SplitCriticalEdge, as it is called in LoopUnswitch.
There doesn't seem to be any noticeable performance difference when bootstrapping clang with this patch.
Reviewers: dberlin, davide, sanjoy, grosser, chandlerc
Reviewed By: davide, grosser
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35528
llvm-svn: 311093
In the case where dfsan provides a custom wrapper for a function,
shadow parameters are added for each parameter of the function.
These parameters are i16s. For targets which do not consider this
a legal type, the lack of sign extension information would cause
LLVM to generate anyexts around their usage with phi variables
and calling convention logic.
Address this by introducing zero exts for each shadow parameter.
Reviewers: pcc, slthakur
Differential Revision: https://reviews.llvm.org/D33349
llvm-svn: 311087
VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.
In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.
This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.
For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.
Authors: Gil Rapaport and Ayal Zaks
Differential Revision: https://reviews.llvm.org/D32871
llvm-svn: 311077
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
The patch was originally committed in r311039 and reverted in r311049.
This revision fixes the problem with not adding a dependency on the
DominatorTreeWrapperPass for the LegacyPassManager.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311057
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311039
Summary:
Mark LoopDataPrefetch and AArch64FalkorHWPFFix passes as preserving
ScalarEvolution since they do not alter loop structure and should not
alter any SCEV values (though LoopDataPrefetch may introduce new
instructions that won't have cached SCEV values yet).
This can result in slight code differences, mainly w.r.t. nsw/nuw flags
on SCEVs, since these are computed somewhat lazily when a zext/sext
instruction is encountered. As a result, passes after the modified
passes may see SCEVs with more nsw/nuw flags present.
Reviewers: sanjoy, anemet
Subscribers: aemerson, rengolin, mzolotukhin, javed.absar, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36716
llvm-svn: 311032
To clear assumptions that are potentially invalid after trivialization, we need
to walk the use/def chain. Normally, the only way to reach an instruction with
an unsized type is via an instruction that has side effects (or otherwise will
demand its input bits). That would stop the walk. However, if we have a
readnone function that returns an unsized type (e.g., void), we must avoid
asking for the demanded bits of the function call's return value. A
void-returning readnone function is always dead (and so we can stop walking the
use/def chain here), but the check is necessary to avoid asserting.
Fixes PR34211.
llvm-svn: 311014
Summary: When we move then-else code to if, we need to merge its debug info, otherwise the hoisted instruction may have inaccurate debug info attached.
Reviewers: aprantl, probinson, dblaikie, echristo, loladiro
Reviewed By: aprantl
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D36778
llvm-svn: 310985
We were only allowing ConstantInt before. This patch allows splat of ConstantInt too.
Differential Revision: https://reviews.llvm.org/D36763
llvm-svn: 310970
Narrow ops are better for bit-tracking, and in the case of vectors,
may enable better codegen.
As the trunc test shows, this can allow follow-on simplifications.
There's a block of code in visitTrunc that deals with shifted ops
with FIXME comments. It may be possible to remove some of that now,
but I want to make sure there are no problems with this step first.
http://rise4fun.com/Alive/Y3a
Name: hoist_ashr_ahead_of_sext_1
%s = sext i8 %x to i32
%r = ashr i32 %s, 3 ; shift value is < than source bit width
=>
%a = ashr i8 %x, 3
%r = sext i8 %a to i32
Name: hoist_ashr_ahead_of_sext_2
%s = sext i8 %x to i32
%r = ashr i32 %s, 8 ; shift value is >= than source bit width
=>
%a = ashr i8 %x, 7 ; so clamp this shift value
%r = sext i8 %a to i32
Name: junc_the_trunc
%a = sext i16 %v to i32
%s = ashr i32 %a, 18
%t = trunc i32 %s to i16
=>
%t = ashr i16 %v, 15
llvm-svn: 310942
Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.
What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.
This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.
The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.
This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:
```
# functions: 52283
# samples: 337609
# reverse unreachable BBs: 216022
# BBs: 247840796
Percent reverse-unreachable: 0.08716159869015269 %
Max(PercRevUnreachable) in a function: 87.58620689655172 %
# > 25 % samples: 471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```
Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.
I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).
Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel
Reviewed By: dberlin
Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050
Differential Revision: https://reviews.llvm.org/D35851
llvm-svn: 310940
Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.
Differential Revision: https://reviews.llvm.org/D36408
llvm-svn: 310910
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ]
Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D36518
llvm-svn: 310847
The assert was added with r310779 and is usually correct,
but as the test shows, not always. The 'volatile' on the
load is needed to expose the faulty path because without
it, DemandedBits would return that the load is just dead
rather than not demanded, and so we wouldn't hit the
bogus assert.
Also, since the lambda is just a single-line now, get rid
of it and inline the DB.isAllOnesValue() calls.
This should fix (prevent execution of a faulty assert):
https://bugs.llvm.org/show_bug.cgi?id=34179
llvm-svn: 310842
On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.
This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.
Differential Revision: https://reviews.llvm.org/D36309
llvm-svn: 310824
Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.
Next step is to use m_APInt instead of ConstantInt.
Reviewers: spatel, efriedma, davide, majnemer
Reviewed By: spatel
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D36439
llvm-svn: 310806
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.
Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.
Patch by Tarun Rajendran.
Differential Revision: https://reviews.llvm.org/D36127
llvm-svn: 310763
This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).
Differential Revision: https://reviews.llvm.org/D36505
llvm-svn: 310658
We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.
llvm-svn: 310639
Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process.
This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped.
Differential Revision: https://reviews.llvm.org/D36553
llvm-svn: 310638
This make it consistent with STATISTIC which it will often appears near.
While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.
llvm-svn: 310637
This implementation of SanitizerCoverage instrumentation inserts different
callbacks depending on constantness of operands:
1. If both operands are non-const, then a usual
__sanitizer_cov_trace_cmp[1248] call is inserted.
2. If exactly one operand is const, then a
__sanitizer_cov_trace_const_cmp[1248] call is inserted. The first
argument of the call is always the constant one.
3. If both operands are const, then no callback is inserted.
This separation comes useful in fuzzing when tasks like "find one operand
of the comparison in input arguments and replace it with the other one"
have to be done. The new instrumentation allows us to not waste time on
searching the constant operands in the input.
Patch by Victor Chibotaru.
llvm-svn: 310600
I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141
The truncated rotate-by-variable patterns elude all of the existing transforms because
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist
without the whole pattern. So we need an unfortunately large pattern match. But by
simplifying this pattern in IR, the backend is already able to generate
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount).
Note that rotate-by-constant doesn't have this problem - smaller folds should already
produce the narrow IR ops.
Differential Revision: https://reviews.llvm.org/D36395
llvm-svn: 310509
Summary:
Instrumentation to copy byval arguments is now correctly inserted
after the dynamic shadow base is loaded.
Reviewers: vitalybuka, eugenis
Reviewed By: vitalybuka
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D36533
llvm-svn: 310503
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
In the recursive call to isAMCompletelyFolded(), the passed offset should be
the sum of F.BaseOffset and Fixup.Offset.
Review: Quentin Colombet.
llvm-svn: 310462