Fixed copy+paste vector alignment to correct for per-element scalar loads
Increased to 512-bit data sizes in preparation of avx512 tests
llvm-svn: 283748
Value names may be prefixed with a binary '1' to indicate that the
backend should not modify the symbols due to any platform naming
convention.
This should not show up in the YAML opt record file because it breaks
the YAML parser.
llvm-svn: 283656
Summary:
If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup.
Before this change, CoroSplit synthesized these stores after coro.begin:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
store void (%f.Frame*)* @f.destroy, void (%f.Frame*)** %destroy.addr
```
In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops.
Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
%0 = select i1 %need.alloc, void (%f.Frame*)* @f.destroy, void (%f.Frame*)* @f.cleanup
store void (%f.Frame*)* %0, void (%f.Frame*)** %destroy.addr
```
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25377
llvm-svn: 283625
Summary: Add tests for cases where we have zero coverage in RS4GC.
Reviewers: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25341
llvm-svn: 283591
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.
This is actually 4 related transforms in one patch:
// select X, (sext X), C --> select X, -1, C
// select X, (zext X), C --> select X, 1, C
// select X, C, (sext X) --> select X, C, 0
// select X, C, (zext X) --> select X, C, 0
Differential Revision: https://reviews.llvm.org/D25126
llvm-svn: 283575
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Reviewers: reames, sanjoy, apilipenko
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24772
llvm-svn: 283557
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
llvm-svn: 283535
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
GetCaseResults assumed that a terminator with one successor was an
unconditional branch. This is not necessarily the case, it could be a
cleanupret.
Strengthen the check by querying whether or not the terminator is
exceptional.
llvm-svn: 283517
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.
llvm-svn: 283512
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.
Differential Revision: http://reviews.llvm.org/D25339
llvm-svn: 283490
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
llvm-svn: 283473
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
llvm-svn: 283236
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.
Differential Revision: https://reviews.llvm.org/D24680
llvm-svn: 283129
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
Differential Revision: https://reviews.llvm.org/D25165
llvm-svn: 283119
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.
```
%save = call token @llvm.coro.save(i8* null)
%Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
switch i8 %suspend, label %exit [
i8 0, label %await.ready
i8 1, label %exit
]
await.ready:
%val = load i32, i32* %Result.i19
```
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24418
llvm-svn: 282902
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.
We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24417
llvm-svn: 282899
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24412
llvm-svn: 282897
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
llvm-svn: 282894
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24976
llvm-svn: 282833
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006
Reviewers: tejohnson, mehdi_amini, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24940
llvm-svn: 282733
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.
Reviewers: reames, sanjoy
Differential Revision: http://reviews.llvm.org/D24280
From: Li Huang
llvm-svn: 282650
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...
llvm-svn: 282620
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
llvm-svn: 282612
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
llvm-svn: 282542
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.
Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.
Reviewers: pcc
Subscribers: kcc
Differential Revision: https://reviews.llvm.org/D24948
llvm-svn: 282461
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.
Differential Revision: https://reviews.llvm.org/D23889
llvm-svn: 282418
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.
Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24408
llvm-svn: 282414
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.
llvm-svn: 282401
This patch fixes PR30366.
Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.
This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.
Differential Revision: https://reviews.llvm.org/D24565
llvm-svn: 282398
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
llvm-svn: 282237
We already have the udiv variant of this transform, so I think this is ok for
InstCombine too even though there is an increase in IR instructions. As the
tests and TODO comments show, the transform can lead to follow-on combines.
This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672
Differential Revision: https://reviews.llvm.org/D24527
llvm-svn: 282209
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"
It's causing compiler crashes building Harfbuzz (PR30499).
llvm-svn: 282199
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Differential Revision: https://reviews.llvm.org/D24517
llvm-svn: 282168
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.
A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:
if (isa<TerminatorInst>(I))
return false;
is better than listing all the other less frequent instructions.
Differential Revision: https://reviews.llvm.org/D23929
llvm-svn: 282160
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.
This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.
This improves an internal workload by 2.2x.
Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'
Differential Revision: https://reviews.llvm.org/D24564
llvm-svn: 282101
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.
Differential Revision: https://reviews.llvm.org/D24275
llvm-svn: 282087
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.
Reviewers: dnovillo
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24788
llvm-svn: 282084
SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it
transforms loads/stores. This patch fixes a couple occurences of this
issue.
(Partially addresses PR28981).
Differential Revision: https://reviews.llvm.org/D23549
llvm-svn: 281960
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.
Reviewers: dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24734
llvm-svn: 281927
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.
Reviewers: dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24732
llvm-svn: 281912
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24410
llvm-svn: 281911
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.
The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.
This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.
Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Differential Revision: https://reviews.llvm.org/D23715
llvm-svn: 281895
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.
Fixes PR30373.
llvm-svn: 281889
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24409
llvm-svn: 281870
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.
Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.
Fixes PR30419.
Reviewers: mehdi_amini
Subscribers: probinson, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24701
llvm-svn: 281840
This is a fix for PR30318.
Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.
llvm-svn: 281784
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.
I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.
This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942
Differential Revision: https://reviews.llvm.org/D24677
llvm-svn: 281777
A follow-up patch will rename this pass and the source file accordingly,
but I figured the non-NFC change will be easier to spot in isolation.
Differential Revision: https://reviews.llvm.org/D24641
llvm-svn: 281744
These 2 helper functions were already using APInt internally, so just
change the API and caller to allow folds for splats. The scalar
regression tests look quite thorough, so I just added a couple of
tests to prove that vectors are handled too.
These folds should be grouped with the other cmp+shift folds though.
That can be an NFC follow-up.
llvm-svn: 281663
GlobalOpt is already dead-code-eliminating global definitions. With
this change it also takes care of declarations.
Hopefully this should make it now a strict superset of GlobalDCE.
This is important for LTO/ThinLTO as we don't want the linker to see
"undefined reference" when it processes the input files: it could
prevent proper internalization (or even load an extra file from a
static archive, changing the behavior of the program!).
llvm-svn: 281653
The patch is to partially fix PR10584. Correlated Value Propagation queries LVI
to check non-null for pointer params of each callsite. If we know the def of
param is an alloca instruction, we know it is non-null and can return early from
LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is
constant. If the def of the pointer is an alloca instruction, we know it is not
a constant pointer. These shortcuts can reduce the cost of CVP significantly.
Differential Revision: https://reviews.llvm.org/D18066
llvm-svn: 281586
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.
Differential Revision: https://reviews.llvm.org/D24511
llvm-svn: 281485
ObjC library call with call return.
ARC contraction tries to replace uses of an argument passed to an
objective-c library call with the call return value. For example, in the
following IR, it replaces uses of argument %9 and uses of the values
discovered traversing the chain upwards (%7 and %8) with the call return
%10, if they are dominated by the call to @objc_autoreleaseReturnValue.
This transformation enables code-gen to tail-call the call to
@objc_autoreleaseReturnValue, which is necessary to enable auto release
return value optimization.
%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = bitcast %0* %8 to i8*
%10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9)
ret %0* %8
Since r276727, llvm started removing redundant bitcasts and as a result
started feeding the following IR to ARC contraction:
%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7)
ret %0* %8
ARC contraction no longer does the optimization described above since it
only traverses the chain upwards and fails to recognize that the
function return can be replaced by the call return. This commit changes
ARC contraction to traverse the chain downwards too and replace uses of
bitcasts with the call return.
rdar://problem/28011339
Differential Revision: https://reviews.llvm.org/D24523
llvm-svn: 281419
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.
Example:
%cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>
On a little endian target, %cast could have been folded to:
<4 x i32><i32 undef, i32 undef, i32 2, i32 0>
This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.
Differential Revision: https://reviews.llvm.org/D24301
llvm-svn: 281343
InstSimplify doesn't always know how to fold a bitcast of a constant vector.
In particular, the logic in InstSimplify doesn't know how to handle the case
where the constant vector in input contains some undef elements, and the
number of elements is smaller than the number of elements of the bitcast
vector type.
llvm-svn: 281332
Teach SimplifyLibcalls that in can treat functions annotated with
apcs, aapcs or aapcs_vfp like normal C functions if they only take
and return integer or pointer values, and the target is not iOS.
Differential Revision: https://reviews.llvm.org/D24453
llvm-svn: 281322
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Differential Revision: http://reviews.llvm.org/D20147
llvm-svn: 281284
Trying to infer the 'returned' attribute if an argument is already
'returned' can lead to verification failure: inference might determine
that a different argument is passed through which would result in two
different arguments marked as 'returned'.
This fixes PR30350.
llvm-svn: 281221
This should *actually* fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas.
The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan).
llvm-svn: 281162
Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example.
So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction.
Should fix PR30244.
llvm-svn: 281160
Summary:
This will let e.g. the load/store vectorizer propagate this metadata
appropriately.
Reviewers: arsenm
Subscribers: tra, jholewinski, hfinkel, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23479
llvm-svn: 281153
This would create a bitcast use which fails the verifier: swifterror values may
only be used by loads, stores, and as function arguments.
rdar://28233244
llvm-svn: 281114
I was looking to fix a bug in getComplexity(), and these cases showed up as
obvious failures. I'm not sure how to find these in general though.
llvm-svn: 281055
Summary:
If one of the uses of the value is a single edge PHINode, handle it.
Original:
%val = something
<suspend>
%p = PHINode [%val]
After Spill + Part13:
%val = something
%slot = gep val.spill.slot
store %val, %slot
<suspend>
%p = load %slot
Plus tiny fixes/changes:
* use correct index for coro.free in CoroCleanup
* fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24242
llvm-svn: 281020
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.
Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo
Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24164
llvm-svn: 280995
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.
llvm-svn: 280992
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.
This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.
Differential Revision: https://reviews.llvm.org/D24271
llvm-svn: 280979
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.
This fixes PR28888.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas
Differential Revision: https://reviews.llvm.org/D24203
llvm-svn: 280901
We can't create metadata-valued PHIs; don't try to do so when sinking.
I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).
Previously, running the test case would crash with:
Invalid use of metadata!
%.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
LLVM ERROR: Broken function found, compilation aborted!
llvm-svn: 280866
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.
llvm-svn: 280861
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions. This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.
Reviewers: asbirlea, tstellarAMD
Subscribers: arsenm, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23748
llvm-svn: 280818
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.
llvm-svn: 280807
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi
The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.
This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.
Added reproducible test cases to x86-sse4a.ll.
Differential Revision: https://reviews.llvm.org/D24256
llvm-svn: 280804
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.
Fixes PR30188 (again).
llvm-svn: 280792
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.
Fixes PR30292.
llvm-svn: 280790
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set). However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region). In this case we'd use static estimates. Since static
estimates for branches are determined independently, they are
inconsistent. Updating them can "randomly" inflate block frequencies.
I've run into this in a completely cold loop of h264ref from
SPEC. -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).
The new testcase demonstrate the problem. We check array elements
against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting
check. The block names should be self-explanatory.
In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).
There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated. These are the resulting branch and block
frequencies for the loop body:
check_1 (16)
(8) / |
eq_1 | (8)
\ |
check_2 (16)
(8) / |
eq_2 | (8)
\ |
check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
First we thread eq_1 -> check_2 to check_3. Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2. Changed frequencies are highlighted with * *:
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
\ eq_2 | (*0*)
\ \ |
` --- check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges. Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
/-- eq_2~ | (*0*)
(back edge) |
check_3 (*0*)
(*0*) / |
(loop exit) | (*0*)
|
(back edge)
As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.
There are a few potential problems here:
1. The profile data seems odd. There is a single profile sample of the
loop being entered. On the other hand, there are no weights inside the
loop.
2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.
3. We shouldn't create profile metadata that is calculated from static
estimation. I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling. Estimated probabilities should only be reflected in BPI/BFI.
Any one of these would probably fix the immediate problem. I went for 3
because I think it's a good policy to have and added a FIXME about 2.
Differential Revision: https://reviews.llvm.org/D24118
llvm-svn: 280713
Summary:
Move early uses of spilled variables after CoroBegin.
For example, if a parameter had address taken, we may end up with the code
like:
define @f(i32 %n) {
%n.addr = alloca i32
store %n, %n.addr
...
call @coro.begin
This patch fixes the problem by moving uses of spilled variables after CoroBegin.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24234
llvm-svn: 280678
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
%.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"
This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.
Differential Revision: https://reviews.llvm.org/D24231
llvm-svn: 280660
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:
* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.
This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24068
llvm-svn: 280646
memcpy with ld/st.
When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.
Differential Revision: https://reviews.llvm.org/D23499
llvm-svn: 280617
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree. The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations. In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination. This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.
Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.
Fixes PR29151.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24117
llvm-svn: 280610
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.
Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.
This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126
Differential Revision: https://reviews.llvm.org/D23886
llvm-svn: 280504
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.
I've added the updated member index computation to the existing reverse
interleaved group test.
llvm-svn: 280497
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.
An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.
Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".
This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.
Differential Revision: https://reviews.llvm.org/D24143
llvm-svn: 280488
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.
The real fix is in SROA, which I'll be looking into.
llvm-svn: 280470
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.
The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().
Differential Revision: https://reviews.llvm.org/D24031
llvm-svn: 280370
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.
As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:
if (a)
x(1);
else if (b)
x(2);
This produces the following CFG:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ | /
[ end ]
[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.
We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).
We are now able to detect this case and split off the unconditional arcs to a common successor:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ / |
[sink.split] |
\ /
[ end ]
Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.
llvm-svn: 280364
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.
This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.
This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:
%a = load i32* %b %d = load i32* %b
%c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1
Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).
This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.
Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.
In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.
In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.
This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.
llvm-svn: 280351
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
-fprofile-dir=path allows the user to specify where .gcda files should be
emitted when the program is run. In particular, this is the first flag that
causes the .gcno and .o files to have different paths, LLVM is extended to
support this. -fprofile-dir= does not change the file name in the .gcno (and
thus where lcov looks for the source) but it does change the name in the .gcda
(and thus where the runtime library writes the .gcda file). It's different from
a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip
paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX.
To implement this we split -coverage-file into -coverage-data-file and
-coverage-notes-file to specify the two different names. The !llvm.gcov
metadata node grows from a 2-element form {string coverage-file, node dbg.cu}
to 3-elements, {string coverage-notes-file, string coverage-data-file, node
dbg.cu}. In the 3-element form, the file name is already "mangled" with
.gcno/.gcda suffixes, while the 2-element form left that to the middle end
pass.
llvm-svn: 280306
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.
This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.
Reviewers: dberlin, sanjoy, reames, majnemer
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19821
llvm-svn: 280279
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.
The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.
Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)
My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.
Differential Revision: https://reviews.llvm.org/D24000
llvm-svn: 280250
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.
llvm-svn: 280228
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.
The real fix is in SROA, which I'll be looking into.
llvm-svn: 280219
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.
As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:
if (a)
x(1);
else if (b)
x(2);
This produces the following CFG:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ | /
[ end ]
[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.
We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).
We are now able to detect this case and split off the unconditional arcs to a common successor:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ / |
[sink.split] |
\ /
[ end ]
Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.
llvm-svn: 280217
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.
This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.
This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:
%a = load i32* %b %d = load i32* %b
%c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1
Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).
This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.
Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.
In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.
In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.
This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.
llvm-svn: 280216
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.
There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.
Should fix PR28964.
llvm-svn: 280215
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.
2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.
Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23993
llvm-svn: 280184
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.
e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.
This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.
Reviewers: tstellarAMD, arsenm, jlebar
Subscribers: mzolotukhin, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24057
llvm-svn: 280179
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.
Differential Revision: https://reviews.llvm.org/D24009
llvm-svn: 280148
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
metadata is used to get the name for locals which would have been
renamed during promotion.
Reviewers: davidxl, mehdi_amini
Subscribers: Prazek, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24004
llvm-svn: 280024
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.
This will aid rematerialization of geps and casts instead of relocating.
Reviewers: sanjoy, reames, igor
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23920
llvm-svn: 279975
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.
This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in
the backend after:
https://reviews.llvm.org/rL279970
llvm-svn: 279973
Summary:
[Coroutines] Part 9: Add cleanup subfunction.
This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)
Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.
CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code
CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.
Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.
* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23844
llvm-svn: 279971
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".
Differential revision: https://reviews.llvm.org/D23176
llvm-svn: 279930
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.
Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered. I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off. As before without -Rpass-with-hotness
there is no overhead.
Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'. As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.
Reviewers: eraman, chandlerc, davidxl, hfinkel
Subscribers: tejohnson, Prazek, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D23415
llvm-svn: 279860
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23846
llvm-svn: 279769
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.
The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.
Differential Revision: https://reviews.llvm.org/D23843
llvm-svn: 279732
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.
The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.
A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.
The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.
There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.
Differential Revision: https://reviews.llvm.org/D23169
llvm-svn: 279649
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.
llvm-svn: 279647
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.
Differential Revision: https://reviews.llvm.org/D22918
llvm-svn: 279620
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...
7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23586
llvm-svn: 279609
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.
This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.
This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.
Differential Revision: https://reviews.llvm.org/D23723
llvm-svn: 279562
...because like the corresponding code, this is just too big to keep adding to.
And the next step is to add a vector version of each of these tests to show
missed folds.
Also, auto-generate CHECK lines and add comments for the tests that correspond to
the source code.
llvm-svn: 279530
[Recommitting now an unrelated assertion in SROA is sorted out]
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.
This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.
llvm-svn: 279460
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.
Reviewers: mcrosier, majnemer, spatel, reames
Subscribers: reames, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D23722
llvm-svn: 279448
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.
This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.
llvm-svn: 279443
This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.
See https://reviews.llvm.org/D18777 for details.
llvm-svn: 279433
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.
See comments on https://reviews.llvm.org/D18777 for details.
llvm-svn: 279432
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.
Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.
llvm-svn: 279339
The intended transform is:
// Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
// -> and (icmp eq P, null), (icmp eq Q, null).
P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.
llvm-svn: 279271
CGSCC use a WeakVH to track call sites. RAUW a call within a function
can result in that WeakVH getting confused about whether or not the call
site is still around.
llvm-svn: 279268
Of course, we really need to refactor and fix all of the cmp predicates,
but this one is interesting because without it, we later perform an
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.
llvm-svn: 279263
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 279229
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.
This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.
Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410
llvm-svn: 279125
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.
llvm-svn: 279082
Also, add a scalar test to demonstrate one of the intermediate folds that
is necessary to accomplish the existing, multi-step test. And simplify
the vector tests to only check the final piece of that multi-step transform.
llvm-svn: 278995
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
It is pretty easy to get it down to O(nlogn + mlogm). This
implementation has the added benefit of automatically deduplicating
entries between the two sets.
llvm-svn: 278837
I have audited all the callers of concatenate and none require duplicate
entries to service concatenation.
These duplicates serve no purpose but to needlessly embiggen the IR.
N.B. Layering getMostGenericAliasScope on top of concatenate makes it
O(nlogn + mlogm) instead of O(n*m).
llvm-svn: 278836
Summary:
This patch adds simple coroutine splitting logic to CoroSplit pass.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...
7. Split coroutine into subfunctions <= we are here
8. Coroutine Frame Building algorithm
9. Handle coroutine with unwinds
10+. The rest of the logic
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23461
llvm-svn: 278830
This reverts commit r278660.
It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.
llvm-svn: 278672
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 278660
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
IRCE has the ability to further version pre-loops and post-loops that it
created, but this isn't useful at all. This change teaches IRCE to
leave behind some metadata in the loops it creates (by cloning the main
loop) so that these new loops are not re-processed by IRCE.
Today this bug is hidden by another bug -- IRCE does not update LoopInfo
properly so the loop pass manager does not re-invoke IRCE on the loops
it split out. However, once the latter is fixed the bug addressed in
this change causes IRCE to infinite-loop in some cases (e.g. it splits
out a pre-loop, a pre-pre-loop from that, a pre-pre-pre-loop from that
and so on).
llvm-svn: 278617
The (negative) test case is supposed to check that IRCE does not muck
with range checks it cannot handle, not that it does the right thing in
the absence of profiling information.
llvm-svn: 278612
Loops containing `indirectbr` may not be in simplified form, even after
running LoopSimplify. Reject then gracefully, instead of tripping an
assert.
llvm-svn: 278611
Summary:
Refactor the existing support into a LoopDataPrefetch implementation
class and a LoopDataPrefetchLegacyPass class that invokes it.
Add a new LoopDataPrefetchPass for the new pass manager that utilizes
the LoopDataPrefetch implementation class.
Reviewers: mehdi_amini
Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23483
llvm-svn: 278591
`IVVisitor::visitCast` used to have the invariant that if the
instruction it was passed was a sext or zext instruction, the result of
the instruction would be wider than the induction variable. This is no
longer true after rL275037, so this change teaches `IndVarSimplify` s
implementation of `IVVisitor::visitCast` to work with the relaxed
invariant.
A corresponding change to SimplifyIndVar to preserve the said invariant
after rL275037 would also work, but given how `IVVisitor::visitCast` is
spelled (no indication of said invariant), I figured the current fix is
cleaner.
Fixes PR28935.
llvm-svn: 278584
InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop
body, even if that cycle isn't a natural loop.
Fixes PR28541.
Differential Revision: https://reviews.llvm.org/D22952
llvm-svn: 278573
They aren't static, and moving them to the entry block across something
else will only result in tears.
Root cause of http://crbug.com/636558.
llvm-svn: 278571
Summary: The refined propagation algorithm is more accurate and robust.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23224
llvm-svn: 278522
Rewrite Visited[Cond] = getValueFromConditionImpl(..., Visited) statement which can lead to a memory corruption since getValueFromConditionImpl changes Visited map and invalidates the iterators.
llvm-svn: 278514
Take range metadata into account for conditions like this:
%length = load i32, i32* %length_ptr, !range !{i32 0, i32 2147483647}
%cmp = icmp ult i32 %a, %length
This is a common pattern for range checks where the length of the array is dynamically loaded.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23267
llvm-svn: 278496
Currently LVI can only gather value constraints from comparisons like:
* icmp <pred> Val, ...
* icmp ult (add Val, Offset), ...
In fact we can handle any predicate in latter comparisons.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23357
llvm-svn: 278493
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23412
llvm-svn: 278481
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.
This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23339
llvm-svn: 278432
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
This commit contains changes in a newly added testcase which was not included in the previous commit (which was reverted later on).
https://reviews.llvm.org/D23075
llvm-svn: 278421
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
Summary:
This fixes PR 28933 by making sure GVNHoist does not try to recreate memory
accesses when it has not actually moved them.
Reviewers: sebpop
Subscribers: llvm-commits, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D23411
llvm-svn: 278401
Summary:
Keep track of all methods for which we have devirtualized at least
one call and then print them sorted alphabetically. That allows to
avoid duplicates and also makes the order deterministic.
Add optimization names into the remarks, so that it's easier to
understand how has each method been devirtualized.
Fix a bug when wrong methods could have been reported for
tryVirtualConstProp.
Reviewers: kcc, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23297
llvm-svn: 278389
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
https://reviews.llvm.org/D23075
llvm-svn: 278334
Change --no-pgo-warn-missing to -pgo-warn-missing-function
and negate the default. /NFC
Add more test to make sure the warning is off by default
llvm-svn: 278314
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
Teach LVI how to gather information from conditions in the form of (cond1 && cond2). Our out-of-tree front-end emits range checks in this form.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23200
llvm-svn: 278231
This is a resubmission of previously reverted r277592. It was hitting overly strong assertion in getConstantRange which was relaxed in r278217.
Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23059
llvm-svn: 278220
Hal pointed out that the semantic of our intrinsic and the libc
call are slightly different. Add a comment while I'm here to
explain why we can't emit an intrinsic. Thanks Hal!
llvm-svn: 278200
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.
BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level). Thus there is no additional overhead without the
flag.
Reviewers: hfinkel, davidxl, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22694
llvm-svn: 278185
Summary:
This hopefully fixes PR28825. The problem now was that a value from the
original loop was used in a subloop, which became a sibling after separation.
While a subloop doesn't need an lcssa phi node, a sibling does, and that's
where we broke LCSSA. The most natural way to fix this now is to simply call
formLCSSA on the original loop: it'll do what we've been doing before plus
it'll cover situations described above.
I think we don't need to run formLCSSARecursively here, and we have an assert
to verify this (I've tried testing it on LLVM testsuite + SPECs). I'd be happy
to be corrected here though.
I also changed a run line in the test from '-lcssa -loop-unroll' to
'-lcssa -loop-simplify -indvars', because it exercises LCSSA
preservation to the same extent, but also makes less unrelated
transformation on the CFG, which makes it easier to verify.
Reviewers: chandlerc, sanjoy, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23288
llvm-svn: 278173
Summary:
We teach alias analysis that invariant.start is readonly.
This helps with GVN and memcopy optimizations that currently treat.
invariant.start as a clobber.
We need to treat this as readonly, so that DSE does not incorrectly
remove stores prior to the invariant.start
Reviewers: sanjoy, reames, majnemer, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23214
llvm-svn: 278138
no prof data for func warning is turned off by default
due to its high verbosity and minimal usefulness.
Differential Revision: http://reviews.llvm.org/D23295
llvm-svn: 278127
Summary:
In the use optimizer, we need to keep of whether the lower bound still
dominates us or else we may decide a lower bound is still valid when it
is not due to intervening pushes/pops. Fixes PR28880 (and probably a
bunch of other things).
Reviewers: george.burgess.iv
Subscribers: MatzeB, llvm-commits, sebpop
Differential Revision: https://reviews.llvm.org/D23237
llvm-svn: 277978
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Summary:
CoroSplit pass processes the coroutine twice. First, it lets it go through
complete IPO optimization pipeline as a single function. It forces restart
of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger"
which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager.
(In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up,
adds coroutine subfunctions to the SCC to be processed by IPO pipeline.)
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. <= we are here
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23234
llvm-svn: 277936
Summary:
This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin
is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine
subfunction associated with that coro.begin.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization <= we are here
5.Add CGSCC restart trigger + tests.
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23229
llvm-svn: 277908
Fixes PR28764. Right now there is no way to test this, but (as
mentioned on the PR) with Michael Zolotukhin's yet to be checked in
LoopSimplify verfier, 8 of the llvm-lit tests for IRCE crash.
llvm-svn: 277891
This fixes PR28825. The problem was that we only checked if a value from
a created inner loop is used in the outer loop, and fixed LCSSA for
them. But we missed to fixup LCSSA for values used in exits of the outer
loop.
llvm-svn: 277877
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.
Reviewers: davidxl, eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D22368
llvm-svn: 277860
Summary:
Chrome on Linux uses WholeProgramDevirt for speed ups, and it's
important to detect regressions on both sides: the toolchain,
if fewer methods get devirtualized after an update, and Chrome,
if an innocently looking change caused many hot methods become
virtual again.
The need to track devirtualized methods is not Chrome-specific,
but it's probably the only user of the pass at this time.
Reviewers: kcc
Differential Revision: https://reviews.llvm.org/D23219
llvm-svn: 277856
Summary: We do not care about intrinsic calls when assigning discriminators.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23212
llvm-svn: 277843
Summary:
Having -O0 in opt allows testing that -O0 optimization
pipeline is built correctly.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23208
llvm-svn: 277829
This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.
This also reverts commit r277337, which "fixed" the test case.
llvm-svn: 277820
Summary:
Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and
B is a compatible constant, also for zext instead of sext. This will then be
further folded into logical operations.
The transformation would be valid for non-i1 types as well, but other parts of
InstCombine prefer to have sext from non-i1 as an operand of select.
Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32
for boolean operations. With this change, the boolean logic is fully
recovered.
Reviewers: majnemer, spatel, tstellarAMD
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22747
llvm-svn: 277801
The patch splits a complex && if condition into easier to read and understand
logic. That wrong early exit condition was letting some instructions with not
all operands available pass through when HoistingGeps was true.
Differential Revision: https://reviews.llvm.org/D23174
llvm-svn: 277785
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.
Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.
Differential Revision: https://reviews.llvm.org/D23049
llvm-svn: 277782
PR28848 had a very nice reduction of the underlying cause of the bug.
Our ValueMap had, in an entry for an Instruction, a ConstantInt.
This is not at all unexpected but should be handled properly.
llvm-svn: 277773
This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume
and coro.destroy intrinsics by replacing them with an indirect call to an address
returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes
devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate
function address.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22998
llvm-svn: 277765
I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well,
so we know where those folds are happening in InstSimplify.
llvm-svn: 277738
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
With this patch we compute the MemorySSA once and update it in the code generator.
Differential Revision: https://reviews.llvm.org/D22966
llvm-svn: 277649
Some of these tests need to be cleaned up further to make it obvious
what they're testing, but as a first step remove all instances of
"grep".
llvm-svn: 277648
This reverts commit r277611 and the followup r277614.
Bootstrap builds and chromium builds are crashing during inlining after
this change.
llvm-svn: 277642
This is a follow-up to r277637. It teaches MemorySSA that invariant
loads (and loads of provably constant memory) are always liveOnEntry.
llvm-svn: 277640
This patch makes MemorySSA recognize atomic/volatile loads, and makes
MSSA treat said loads specially. This allows us to be a bit more
aggressive in some cases.
Administrative note: Revision was LGTM'ed by reames in person.
Additionally, this doesn't include the `invariant.load` recognition in
the differential revision, because I feel it's better to commit that
separately. Will commit soon.
Differential Revision: https://reviews.llvm.org/D16875
llvm-svn: 277637
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
This removes the restriction for the icmp constant, but as noted by the FIXME comments,
we still need to change individual checks for binop operand constants.
llvm-svn: 277629
This is a fix for PR28697.
An MDNode can indirectly refer to a GlobalValue, through a
ConstantAsMetadata. When the GlobalValue is deleted, the MDNode operand
is reset to `nullptr`. If the node is uniqued, this can lead to a
hard-to-detect cache invalidation in a Metadata map that's shared across
an LLVMContext.
Consider:
1. A map from Metadata* to `T` called RemappedMDs.
2. A node that references a global variable, `!{i1* @GV}`.
3. Insert `!{i1* @GV} -> SomeT` in the map.
4. Delete `@GV`, leaving behind `!{null} -> SomeT`.
Looking up the generic and uninteresting `!{null}` gives you `SomeT`,
which is likely related to `@GV`. Worse, `SomeT`'s lifetime may be tied
to the deleted `@GV`.
This occurs in practice in the shared ValueMap used since r266579 in the
IRMover. Other code that handles more than one Module (with different
lifetimes) in the same LLVMContext could hit it too.
The fix here is a partial revert of r225223: in the rare case that an
MDNode operand is a ConstantAsMetadata (i.e., wrapping a node from the
Value hierarchy), drop uniquing if it gets replaced with `nullptr`.
This changes step #4 above to leave behind `distinct !{null} -> SomeT`,
which can't be confused with the generic `!{null}`.
In theory, this can cause some churn in the LLVMContext's MDNode
uniquing map when Values are being deleted. However:
- The number of GlobalValues referenced from uniqued MDNodes is
expected to be quite small. E.g., the debug info metadata schema
only references GlobalValues from distinct nodes.
- Other Constants have the lifetime of the LLVMContext, whose teardown
is careful to drop references before deleting the constants.
As a result, I don't expect a compile time regression from this change.
llvm-svn: 277625
We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.
This fixes PR28802.
llvm-svn: 277611
reason about and less error prone.
The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.
Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.
Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.
One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.
The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]
Thanks to Sean for the review!
Differential Revision: https://reviews.llvm.org/D22724
llvm-svn: 277561
Summary:
Sometimes, bitsets could get really large (>300k entries) and
we might want to drop a check, as it would have a too much cost.
Adding a flag to control how much penalty are we willing to pay
for bitsets.
Reviewers: kcc
Differential Revision: https://reviews.llvm.org/D23088
llvm-svn: 277556
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.
llvm-svn: 277524
Update comment for isOutOfScope and add a testcase for uniform value being used
out of scope.
Differential Revision: https://reviews.llvm.org/D23073
llvm-svn: 277515
This patch enables the vectorizer to generate both scalar and vector versions
of an integer induction variable for a given loop. Previously, we only
generated a scalar induction variable if we knew all its users were going to be
scalar. Otherwise, we generated a vector induction variable. In the case of a
loop with both scalar and vector users of the induction variable, we would
generate the vector induction variable and extract scalar values from it for
the scalar users. With this patch, we now generate both versions of the
induction variable when there are both scalar and vector users and select which
version to use based on whether the user is scalar or vector.
Differential Revision: https://reviews.llvm.org/D22869
llvm-svn: 277474
This patch refactors the logic in collectLoopUniforms and
collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It
adds isScalarAfterVectorization along side isUniformAfterVectorization to
distinguish the two. Known scalar values include those that are uniform,
getelementptr instructions that won't be vectorized, and induction variables
and induction variable update instructions whose users are all known to be
scalar.
This patch includes the following functional changes:
- In collectLoopUniforms, we mark uniform the pointer operands of interleaved
accesses. Although non-consecutive, these pointers are treated like
consecutive pointers during vectorization.
- In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it
isScalarAfterVectorization rather than isUniformAfterVectorization. This
differs from the previous functionaly in that we now add getelementptr
instructions that will not be vectorized into VecValuesToIgnore.
This patch also removes the ValuesNotWidened set used for induction variable
scalarization since, after the above changes, it is now equivalent to
isScalarAfterVectorization.
Differential Revision: https://reviews.llvm.org/D22867
llvm-svn: 277460
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D21768
llvm-svn: 277398
Using RAUW was wrong here; if we have a switch transform such as:
18 -> 6 then
6 -> 0
If we use RAUW, while performing the second transform the *transformed* 6
from the first will be also replaced, so we end up with:
18 -> 0
6 -> 0
Found by clang stage2 bootstrap; testcase added.
llvm-svn: 277332
It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this.
llvm-svn: 277329
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:
switch (i) {
case 5: ...
case 9: ...
case 13: ...
case 17: ...
}
can become:
if ( (i - 5) % 4 ) goto default;
switch ((i - 5) / 4) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
or even better:
switch ( ROTR(i - 5, 2) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.
llvm-svn: 277325
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22713
llvm-svn: 277315
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
Simplify the CFG
Dominator Tree Construction
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Combine redundant instructions
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
...
llvm-svn: 277203
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
Some instructions may have their uses replaced with a symbolic constant.
However, the instruction may still have side effects which percludes it
from being removed from the function. EarlyCSE treated such an
instruction as if it were removed, resulting in PR28763.
llvm-svn: 277114
An undef vector element can be treated as if it had any value. Folding
such a vector element to 0 in a bitcast can open up further folding
opportunities.
llvm-svn: 277104
ConstantExpr::getWithOperands does much of the hard work that
ConstantFoldInstOperandsImpl tries to do but more completely.
This lets us fold ExtractValue/InsertValue expressions.
llvm-svn: 277100
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
\brief Calculate and dump ThinLTO specific inliner stats.
The main statistics are:
(1) Number of inlined imported functions,
(2) Number of imported functions inlined into importing module (indirect),
(3) Number of non imported functions inlined into importing module
(indirect).
The difference between first and the second is that first stat counts
all performed inlines on imported functions, but the second one only the
functions that have been eventually inlined to a function in the importing
module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
possible to e.g. import function `A`, `B` and then inline `B` to `A`,
and after this `A` might be too big to be inlined into some other function
that calls it. It calculates this statistic by building graph, where
the nodes are functions, and edges are performed inlines and then by marking
the edges starting from not imported function.
If `Verbose` is set to true, then it also dumps statistics
per each inlined function, sorted by the greatest inlines count like
- number of performed inlines
- number of performed inlines to importing module
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D22491
llvm-svn: 277089
Summary:
The motivation is the same as in D22141: In order to add the hotness
attribute to optimization remarks we need BFI to be available in all
passes that emit optimization remarks. BFI depends on BPI so unless we
make this lazy as well we would still compute BPI unconditionally.
The solution is to use the new LazyBPI pass in LazyBFI and only compute
BPI when computation of BFI is requested by the client.
I extended the laziness test using a LoopDistribute test to also cover
BPI.
Reviewers: hfinkel, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22835
llvm-svn: 277083
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277072
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277068
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22847
llvm-svn: 277033
When folding an expression, we run ConstantFoldConstantExpression on
each operand of that expression.
However, ConstantFoldConstantExpression can fail and retur nullptr.
Previously, we would bail on further refining the expression.
Instead, use the original operand and see if we can refine a later
operand.
llvm-svn: 276959
Summary:
When we ask the builder to create a bitcast on a constant, we get back a
constant, not an instruction.
Reviewers: asbirlea
Subscribers: jholewinski, mzolotukhin, llvm-commits, arsenm
Differential Revision: https://reviews.llvm.org/D22878
llvm-svn: 276922
When loading or storing in a field of a struct like "a.b.c", GVN is able to
detect the equivalent expressions, and GVN-hoist would fail in the code
generation. This is because the GEPs are not hoisted as scalar operations to
avoid moving the GEPs too far from their ld/st instruction when the ld/st is not
movable. So we end up having to generate code for the GEP of a ld/st when we
move the ld/st. In the case of a GEP referring to another GEP as in "a.b.c" we
need to code generate all the GEPs necessary to make all the operands available
at the new location for the ld/st. With this patch we recursively walk through
the GEP operands checking whether all operands are available, and in the case of
a GEP operand, it recursively makes all its operands available. Code generation
happens from the inner GEPs out until reaching the GEP that appears as an
operand of the ld/st.
Differential Revision: https://reviews.llvm.org/D22599
llvm-svn: 276841
Instead of DFS numbering basic blocks we now DFS number instructions that avoids
the costly operation of which instruction comes first in a basic block.
Patch mostly written by Daniel Berlin.
Differential Revision: https://reviews.llvm.org/D22777
llvm-svn: 276714
Pre-instrumentation inline (pre-inliner) greatly improves the IR
instrumentation code performance, among other benefits. One issue of the
pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This
is due to the fact that the same COMDAT function may have different early
inline decisions across different modules -- that means different copies
of COMDAT functions will have different CFG checksum.
In this patch, we propose a partially renaming the COMDAT group and its
member function/variable so we have different profile counter for each
version. We will post-fix the COMDAT function and the group name with its
FunctionHash.
Differential Revision: http://reviews.llvm.org/D22600
llvm-svn: 276673
The public InlineFunction utility assumes that the passed in
InlineFunctionInfo has a valid AssumptionCacheTracker.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22706
llvm-svn: 276609
If we two loads of two different alignments, we must use the minimum of
the two alignments when hoisting. Same deal for stores.
For allocas, use the maximum of the two allocas.
llvm-svn: 276601
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
A[i] = x;
x -= fp_inc;
}
The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.
Differential Revision: https://reviews.llvm.org/D21330
llvm-svn: 276554
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.
This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.
This fixes PR28474.
Differential Revision: https://reviews.llvm.org/D22554
llvm-svn: 276477
Recommiting r275571 after fixing crash reported in PR28270.
Now we erase elements of IOL in deleteDeadInstruction().
Original Summary:
This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics.
Add test cases which was missing opportunities before.
llvm-svn: 276452
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.
With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.
Reviewers: apilipenko, reames
Subscribers: llvm-commits
llvm-svn: 276447
Just because we can constant fold the result of an instruction does not
imply that we can delete the instruction. It may have side effects.
This fixes PR28655.
llvm-svn: 276389
If `-irce-skip-profitability-checks` is passed in, IRCE will kick in in
all cases where it is legal for it to kick in. This flag is intended to
help diagnose and analyse performance issues.
llvm-svn: 276372
Do not clone stored values unless they are GEPs that are special cased to avoid
hoisting them without hoisting their associated ld/st.
Differential revision: https://reviews.llvm.org/D22652
llvm-svn: 276358
rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466:
https://llvm.org/bugs/show_bug.cgi?id=28466
It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to
recognize that pattern to eliminate selects that are choosing between some value and some
bitmasked version of that value.
Note that there is significant room for improvement (refactoring) and enhancement (more
patterns, possibly in InstCombine rather than here).
Differential Revision: https://reviews.llvm.org/D22537
llvm-svn: 276341
This patch moves the update instruction for vectorized integer induction phi
nodes to the end of the latch block. This ensures consistent placement of all
induction updates across all the kinds of int inductions we create (scalar,
splat vector, or vector phi).
Differential Revision: https://reviews.llvm.org/D22416
llvm-svn: 276339
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.
With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant memory in managed languages.
Reviewers: tstellarAMD, reames, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22519
llvm-svn: 276316
The earlier change added hotness attribute to missed-optimization
remarks. This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).
llvm-svn: 276192
We can replace the return values with undef if we replaced all
the call uses with a constant/undef.
Differential Revision: https://reviews.llvm.org/D22336
llvm-svn: 276174
Summary:
Previously we wouldn't move loads/stores across instructions that had
side-effects, where that was defined as may-write or may-throw. But
this is not sufficiently restrictive: Stores can't safely be moved
across instructions that may load.
This patch also adds a DEBUG check that all instructions in our chain
are either loads or stores.
Reviewers: asbirlea
Subscribers: llvm-commits, jholewinski, arsenm, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22547
llvm-svn: 276171
Summary:
Previously if we had a chain that contained a side-effecting
instruction, we wouldn't vectorize it at all. Now we'll vectorize
everything that comes before the side-effecting instruction.
Reviewers: asbirlea
Subscribers: arsenm, jholewinski, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22536
llvm-svn: 276170
Summary: In r275989 we enabled the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))`. Here we add more test cases to assure this folding works for all logical operations `and`/`or`/`xor`.
Reviewers: grosser
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22561
Contributed-by: Matthias Reisinger
llvm-svn: 276105
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
llvm-svn: 276104
Summary:
getVectorizablePrefix previously didn't work properly in the face of
aliasing loads/stores. It unwittingly assumed that the loads/stores
appeared in the BB in address order. If they didn't, it would do the
wrong thing.
Reviewers: asbirlea, tstellarAMD
Subscribers: arsenm, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22535
llvm-svn: 276072
Revert "[LoopSimplify] Update LCSSA after separating nested loops."
This reverts commit r275891.
Revert "[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form."
This reverts commit r275883.
llvm-svn: 276064
We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.
Also port LoopInfo verifier pass to test LoopUnrollPass.
llvm-svn: 276063
Summary:
Previously, the insertion point for stores was the last instruction in
Chain *before calling getVectorizablePrefixEndIdx*. Thus if
getVectorizablePrefixEndIdx didn't return Chain.size(), we still would
insert at the last instruction in Chain.
This patch changes our internal API a bit in an attempt to make it less
prone to this sort of error. As a result, we end up recalculating the
Chain's boundary instructions, but I think worrying about the speed hit
of this is a premature optimization right now.
Reviewers: asbirlea, tstellarAMD
Subscribers: mzolotukhin, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D22534
llvm-svn: 276056
Summary:
This helps keep us honest -- there were a number of ways we could screw
up and still have passed this test.
Reviewers: asbirlea
Subscribers: llvm-commits, arsenm
Differential Revision: https://reviews.llvm.org/D22531
llvm-svn: 276053
The pattern may look more obviously like a sext if written as:
define i32 @g(i16 %x) {
%zext = zext i16 %x to i32
%xor = xor i32 %zext, 32768
%add = add i32 %xor, -32768
ret i32 %add
}
We already have that fold in visitAdd().
Differential Revision: https://reviews.llvm.org/D22477
llvm-svn: 276035
We skipped over ReturnInsts which didn't return an argument which would
lead us to incorrectly conclude that an argument returned by another
ReturnInst was 'returned'.
This reverts commit r275756.
This fixes PR28610.
llvm-svn: 276008
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:
> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.
Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:
`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`
This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation
Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).
As an example, consider the following IR snippet
```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```
which would now be transformed to
```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```
This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.
Reviewers: grosser, vtjnash, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22511
Contributed-by: Matthias Reisinger
llvm-svn: 275989
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.
Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.
An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.
As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.
Differential Revision: https://reviews.llvm.org/D21777
llvm-svn: 275940
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.
The change will make the vector RegUsages estimation less conservative.
Differential Revision: https://reviews.llvm.org/D20474
The recommit fixed the testcase global_alias.ll.
llvm-svn: 275936
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.
The change will make the vector RegUsages estimation less conservative.
Differential Revision: https://reviews.llvm.org/D20474
llvm-svn: 275912
Summary:
Usually LCSSA survives this transformation, but in some cases (see
attached test) it doesn't: values from the original loop after
separating might be used from the outer loop. Before the transformation
it was the same loop, so LCSSA phis were not required.
This fixes PR28272.
Reviewers: sanjoy, hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21665
llvm-svn: 275891
Summary:
The direct motivation for the port is to ensure that the OptRemarkEmitter
tests work with the new PM.
This remains a function pass because we not only create multiple loops
but could also version the original loop.
In the test I need to invoke opt
with -passes='require<aa>,loop-distribute'. LoopDistribute does not
directly depend on AA however LAA does. LAA uses getCachedResult so
I *think* we need manually pull in 'aa'.
Reviewers: davidxl, silvas
Subscribers: sanjoy, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22437
llvm-svn: 275811
While debugging GVNHoist, I found it confusing that the entries in a
VNtoInsns were not always value numbers. They _usually_ were except for
StoreInst in which case they were a hash of two different value numbers.
This leads to two observations:
- It is more difficult to debug things when the semantic contents of
VNtoInsns changes over time.
- Using a single value number is not much cheaper, the value of
VNtoInsns is a SmallVector.
- It is not immediately clear what the algorithm would do if there were
hash collisions in the StoreInst case.
Using a DenseMap of std::pair sidesteps all of this.
N.B. The changes in the test were due their sensitivity to the
iteration order of VNtoInsns which has changed.
llvm-svn: 275761
This reverts also r275029, "Update Clang tests after adding inference for the returned argument attribute"
It broke LTO build. Seems miscompilation.
llvm-svn: 275756
Summary:
To enable profile-guided indirect call promotion in ThinLTO mode, we
simply add call graph edges for each profitable target from the profile
to the summaries, then the summary-guided importing will consider the
callee for importing as usual.
Also we need to enable the indirect call promotion pass creation in the
PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO
backend), so that the newly imported functions are considered for
promotion in the backends.
The IC promotion profiles refer to callees by GUID, which required
adding GUIDs to the per-module VST in bitcode (and assigning them
valueIds similar to how they are assigned valueIds in the combined
index).
Reviewers: mehdi_amini, xur
Subscribers: mehdi_amini, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21932
llvm-svn: 275707
This is a partial implementation of a general fold for associative+commutative operators:
(op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2)))
(op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2))
There are 7 associative operators and 13 cast types, so this could potentially go a lot further.
Differential Revision: https://reviews.llvm.org/D22421
llvm-svn: 275684
This reverts commit r275042; the initial commit triggered self-hosting failures
on ARM/AArch64. James Molloy identified the problematic backend code, which has
been disabled in r275677. Trying again...
Original commit message:
Let FuncAttrs infer the 'returned' argument attribute
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.
llvm-svn: 275678
With r275532 fixing miscompilation of GVN, "inbounds" on certain GEPs in these
tests cannot be preserved any more. Left a TODO in the tests for future
reference.
llvm-svn: 275596
This patch adds a selected set of cleanup passes including a pre-inline pass
before LLVM IR PGO instrumentation. The inline is only intended to apply those
obvious/trivial ones before instrumentation so that much less instrumentation
is needed to get better profiling information. This will drastically improve
the instrumented code performance for large C++ applications. Another benefit
is the context sensitive counts that can potentially improve the PGO
optimization.
Differential Revision: http://reviews.llvm.org/D21405
llvm-svn: 275588
Summary:
This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334
This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution. My goal is to shake out the design issues before scaling
it up to other types and passes.
Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count. It's only printed in opt
currently since clang prints the diagnostic fields directly. E.g.:
remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)
A new API added is similar to emitOptimizationRemarkMissed. The
difference is that it additionally takes a code region that the
diagnostic corresponds to. From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.)
This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context. If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.
A new command-line option is added to turn this on in opt.
My plan is to switch all user of emitOptimizationRemark* to use this
module instead.
Reviewers: hfinkel
Subscribers: rcox2, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21771
llvm-svn: 275583
Summary:
This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics.
Add test cases which was missing opportunities before.
Reviewers: hfinkel, eeckstein, mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D21909
llvm-svn: 275571
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275561
Fix for PR 28418.
opt never finishes compiling a test when -gvn option is passed.
The problem is caused by the fact that GVN fails to fold a constant expression.
Differential Revision: https://reviews.llvm.org/D22185
llvm-svn: 275483
This patch allows the formation of interleaved access groups in loops
containing predicated blocks. However, the predicated accesses are prevented
from forming groups.
Differential Revision: https://reviews.llvm.org/D19694
llvm-svn: 275471
Summary: Port Dead Loop Deletion Pass to the new pass manager.
Reviewers: silvas, davide
Subscribers: llvm-commits, sanjoy, mcrosier
Differential Revision: https://reviews.llvm.org/D21483
llvm-svn: 275453
This patch prevents increases in the number of instructions, pre-instcombine,
due to induction variable scalarization. An increase in instructions can lead
to an increase in the compile-time required to simplify the induction
variables. We now maintain a new map for scalarized induction variables to
prevent us from converting between the scalar and vector forms.
This patch should resolve compile-time regressions seen after r274627.
llvm-svn: 275419
enables the code size optimisation to fold a rem and div into a single
aeabi_uidivmod call. This was not happening before because sdiv was converted
but srem not, and instructions with different signedness are not combined.
Differential Revision: http://reviews.llvm.org/D22214
llvm-svn: 275403
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275401
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
Differential Revision: http://reviews.llvm.org/D21183
llvm-svn: 275382
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
llvm-svn: 275380
This now should also work with the interprocedural variant of the pass.
Slightly easier now that the yak is shaved.
Differential Revision: http://reviews.llvm.org/D22329
llvm-svn: 275363
Summary:
In Scalarizer::gather we see if we already have a scattered form of Op,
and in that case use the new form.
In the particular case of PR28108, the found ValueVector SV has size 2,
where the first Value is nullptr, and the second is indeed a proper Value.
The nullptr then caused an assert to blow when we tried to do
cast<Instruction>(SV[I]).
With this patch we check SV[I] before doing the cast, and if it's nullptr
we just skip over it.
I don't know the Scalarizer well enough to know if this is the best fix
or if something should be done else where to prevent the nullptr from
being in the ValueVector at all, but at least this avoids the crash
and looking at the test case output it looks reasonable.
Reviewers: hfinkel, frasercrmck, wala, mehdi_amini
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21518
llvm-svn: 275359
Treat loads which clip before the start of a global initializer the same
way we treat clipping beyond the end of the initializer: use zeros.
llvm-svn: 275345
Summary:
LSV used to abort vectorizing a chain for interleaved load/store accesses that alias.
Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain.
Reviewers: llvm-commits, jlebar, arsenm
Subscribers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D22119
llvm-svn: 275317