Summary:
SpeculativeExecution enables a series straight line optimizations (such
as SLSR and NaryReassociate) on conditional code. For example,
if (...)
... b * s ...
if (...)
... (b + 1) * s ...
speculative execution can hoist b * s and (b + 1) * s from then-blocks,
so that we have
... b * s ...
if (...)
...
... (b + 1) * s ...
if (...)
...
Then, SLSR can rewrite (b + 1) * s to (b * s + s) because after
speculative execution b * s dominates (b + 1) * s.
The performance impact of this change is significant. It speeds up the
benchmarks running EigenFloatContractionKernelInternal16x16
(ba68f42fa6/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-526)
by roughly 2%. Some internal benchmarks that have the above code pattern
are improved by up to 40%. No significant slowdowns are observed on
Eigen CUDA microbenchmarks.
Reviewers: jholewinski, broune, eliben
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11201
llvm-svn: 242437
Internalizing an individual comdat group member without also internalizing
the other members of the comdat can break comdat semantics. For example,
if a module contains a reference to an internalized comdat member, and the
linker chooses a comdat group from a different object file, this will break
the reference to the internalized member.
This change causes the internalizer to only internalize comdat members if all
other members of the comdat are not externally visible. Once a comdat group
has been fully internalized, there is no need to apply comdat rules to its
members; later optimization passes (e.g. globaldce) can legally drop individual
members of the comdat. So we drop the comdat attribute from all comdat members.
Differential Revision: http://reviews.llvm.org/D10679
llvm-svn: 242423
Self-referential constants containing references to a merged function
no longer cause the MergeFunctions pass to infinite loop. Also adds a
reproduction IR which would otherwise fail, which was isolated from a similar
issue in Chromium.
Author: jrkoenig
Reviewers: nlewycky, jfb
Subscribers: llvm-commits, nlewycky, jfb
Differential Revision: http://reviews.llvm.org/D11208
llvm-svn: 242337
During estimation of unrolling effect we should be able to propagate
constants through casts.
Differential Revision: http://reviews.llvm.org/D10207
llvm-svn: 242257
Sometimes an incidentally created instruction can duplicate a Value used
elsewhere. It then often doesn't end up in the leader table. If it's later
removed, we attempt to remove it from the leader table and segfault.
Instead we should just ignore the removal request, which won't cause any
problems. The reverse situation, where the original instruction is replaced by
the new one (which you might think could leave the leader table empty) cannot
occur, because the incidental instruction will never be found in the first
place.
llvm-svn: 242199
Volatile loads and stores are made visible in global state regardless of
what memory is involved. It is not correct to disregard the ordering
and synchronization scope because it is possible to synchronize with
memory operations performed by hardware.
This partially addresses PR23737.
llvm-svn: 242126
Previously we would refrain from attempting to increase the linkage of
available_externally globals because they were considered weak for the
linker. Now they are treated more like a declaration instead of a weak
definition.
This was causing SSE alignment faults in Chromuim, when some code
assumed it could increase the alignment of a dllimported global that it
didn't control. http://crbug.com/509256
llvm-svn: 242091
This test case was breaking the hexagon elf bot. The failing lines
were actually unnecessary as checking that the store still reads the
correct value demonstrates that everything is working fine now.
llvm-svn: 242073
When spotting that a loop can use ctpop, we were incorrectly replacing all uses of a value with a value derived from ctpop.
The bug here was exposed because we were replacing a use prior to the ctpop with the ctpop value and so we have a use before def, i.e., we changed
%tobool.5 = icmp ne i32 %num, 0
store i1 %tobool.5, i1* %ptr
br i1 %tobool.5, label %for.body.lr.ph, label %for.end
to
store i1 %1, i1* %ptr
%0 = call i32 @llvm.ctpop.i32(i32 %num)
%1 = icmp ne i32 %0, 0
br i1 %1, label %for.body.lr.ph, label %for.end
Even if we inserted the ctpop so that it dominates the store here, that would still be incorrect. The store doesn’t want the result of ctpop.
The fix is very simple, and involves replacing only the branch condition with the ctpop instead of all uses.
Reviewed by Hal Finkel.
llvm-svn: 242068
Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N")
and a runtime trip count. Also, do not unroll loops with unroll full metadata if the
loop has a runtime loop count. Previously, such loops would be unrolled with a
very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be
enabled resulting in a very large (and likely unwise) unroll factor.
llvm-svn: 242047
Summary:
This at least saves compile time. I also encountered a case where
ephemeral values affect whether other variables are promoted, causing
performance issues. It may be a bug in LSR, but I didn't manage to
reduce it yet. Anyhow, I believe it's in general not worth considering
ephemeral values in LSR.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11115
llvm-svn: 242011
There is no suitable basic block to sink instructions in loops without
exits. The only way an instruction in a loop without exits can be used
is as an incoming value to a PHI. In such cases, the incoming block for
the corresponding value is unreachable.
This fixes PR24013.
Differential Revision: http://reviews.llvm.org/D10903
llvm-svn: 241987
After changes in rL231820 loop re-rotation is performed even in -Oz mode. Since loop rotation is disabled for -Oz, it seems loop re-rotation should be disabled too.
Differential Revision: http://reviews.llvm.org/D10961
llvm-svn: 241897
Not doing this can lead to misoptimizations down the line, e.g. because
of range metadata on the replacing load excluding values that are valid
for the load that is being replaced.
llvm-svn: 241886
Summary:
In RewriteLoopExitValues, before expanding out an SCEV expression using
SCEVExpander, try to see if an existing LLVM IR expression already
computes the value we're interested in. If so use that existing
expression.
Apart from reducing IndVars' reliance on the rest of the compilation
pipeline, this also prevents IndVars from concluding some expressions as
"high cost" when they're not. For instance,
`InductiveRangeCheckElimination` often emits code of the following form:
```
len = umin(len_A, len_B)
loop:
...
if (i++ < len)
goto loop
outside_loop:
use(i)
```
`SCEVExpander` refuses to rewrite the use of `i` in `outside_loop`,
since it thinks the value of `i` on loop exit, `len`, is a high cost
expansion since it contains an `umax` in it. With this change,
`IndVars` can see that it can re-use `len` instead of creating a new
expression to compute `umin(len_A, len_B)`.
I considered putting this cleverness in `SCEVExpander`, but I was
worried that it may then have a deterimental effect on other passes
that use it. So I decided it was better to just do this in the one
place where it seems like an obviously good idea, with the intent of
generalizing later if needed.
Reviewers: atrick, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10782
llvm-svn: 241838
Summary:
Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.
However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).
As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.
Reviewers: anemet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10386
llvm-svn: 241673
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
llvm-svn: 241633
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.
This allows us to inline things like:
int foo() { return baz(); }
int __attribute__((target("sse4.2"))) bar() {
return foo();
}
so that generic code can be inlined into specialized functions.
llvm-svn: 241221
TwoAddressInstructionPass stops after a successful commuting but 3 Addr
conversion might be good for some cases.
Consider:
int foo(int a, int b) {
return a + b;
}
Before this commit, we emit:
addl %esi, %edi
movl %edi, %eax
ret
After this commit, we try 3 Addr conversion:
leal (%rsi,%rdi), %eax
ret
Patch by Volkan Keles <vkeles@apple.com>!
Differential Revision: http://reviews.llvm.org/D10851
llvm-svn: 241206
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.
llvm-svn: 241197
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
llvm-svn: 241143
Surprisingly, this is a correctness issue: the mmx type exists for
calling convention purposes, LLVM doesn't have a zero representation for
them.
This partially fixes PR23999.
llvm-svn: 241142
Summary:
nsw are flaky and can often be removed by optimizations. This patch enhances
nsw by leveraging @llvm.assume in the IR. Specifically, NaryReassociate now
understands that
assume(a + b >= 0) && assume(a >= 0) ==> a +nsw b
As a result, it can split more sext(a + b) into sext(a) + sext(b) for CSE.
Test Plan: nary-gep.ll
Reviewers: broune, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10822
llvm-svn: 241139
Set debug location for terminator instruction in loop backedge block
(which is an unconditional jump to loop header). We can't copy debug
location from original backedges, as there can be several of them,
with different debug info locations. So, we follow the approach of
SplitBlockPredecessors, and copy the debug info from first non-PHI
instruction in the header (i.e. destination block).
This is yet another change for PR23837.
llvm-svn: 240999
If we are dealing with a pointer induction variable, isInductionPHI
gives back a step value of Stride / size of pointer. However, we might
be indexing with a legal type wider than the pointer width.
Handle this by inserting casts where appropriate instead of crashing.
This fixes PR23954.
llvm-svn: 240877
The PruneEH pass tries to annotate functions as 'noreturn' if it doesn't
see a ReturnInst. However, a naked function containing inline assembly
can contain control flow leaving the function.
This fixes PR23971.
llvm-svn: 240876
It is possible for a global to be substituted with another global of a
different type or a different kind (i.e. an alias) at IR link time. One
example of this scenario is when a Microsoft ABI vtable is substituted with
an alias referring to a larger vtable containing an RTTI reference.
This will cause the global to be RAUW'd with a possibly bitcasted reference
to the other global. This will of course also affect any references to the
global in bitset metadata.
The right way to handle such metadata is simply to ignore it. This is sound
because the linked module should contain another copy of the bitset entries as
applied to the new global.
llvm-svn: 240866
This change extends the detection of base pointers for vector constructs to handle arbitrary phi and select nodes. The existing non-vector code already handles those, so this is basically just extending the vector special case to be less special cased. It still isn't generalized vector handling since we can't handle arbitrary vector instructions (e.g. shufflevectors), but it's a lot closer.
The general structure of the change is as follows:
* Extend the base defining value relation over a subset of vector instructions and vector typed phi & select instructions.
* Move scalarization from before base pointer rewriting to after base pointer rewriting. The extension of the BDV relation is sufficient to find vector base phis for vector inputs.
* Preserve the existing special case logic for when the base of a vector element is locally obvious. This general idea could be extended to the scalar case as well.
Differential Revision: http://reviews.llvm.org/D10461#inline-84275
llvm-svn: 240850
If we have a caller that knows a particular argument can never be null, we can exploit this fact while simplifying values in the inline cost analysis. This has the effect of reducing the cost for inlining when a null check is present in the callee, but the value is known non null in the caller. In particular, any dependent control flow can be discounted from the cost estimate.
Note that we use the parameter attributes at the call site to memoize the analysis within the caller's code. The setting of this attribute is done in InstCombine, the inline cost analysis just consumes it. This is intentional and important because we want the inline cost analysis results to be easily cachable themselves. We're not currently doing so, but initial results on LTO indicate this will quickly become important.
Differential Revision: http://reviews.llvm.org/D9129
llvm-svn: 240828
Summary:
Fixes PR23809. Without passing the context to SimplifyICmpInst, we would
use the assume to prove that the condition feeding the assume is
trivially true (see isValidAssumeForContext in ValueTracking.cpp),
causing the removal of the assume which may be useful for later
optimizations.
Test Plan: pr23800.ll
Reviewers: hfinkel, majnemer
Reviewed By: hfinkel
Subscribers: henryhu, llvm-commits, wengxt, broune, meheff, eliben
Differential Revision: http://reviews.llvm.org/D10695
llvm-svn: 240683
This previously caused miscompilations as a result of phi nodes receiving
undef incoming values from blocks dominated by such successors.
Differential Revision: http://reviews.llvm.org/D10726
llvm-svn: 240670
We performed a simple, but incomplete, intersection when it came time to
CSE instructions. It didn't handle, for example, the 'exact' flag.
This fixes PR23922.
llvm-svn: 240595