give the files a legacy prefix in the right directory. Use forwarding
headers in the old locations to paper over the name change for most
clients during the transitional period.
No functionality changed here! This is just clearing some space to
reduce renaming churn later on with a new system.
Even when the new stuff starts to go in, it is going to be hidden behind
a flag and off-by-default as it is still WIP and under development.
This patch is specifically designed so that very little out-of-tree code
has to change. I'm going to work as hard as I can to keep that the case.
Only direct forward declarations of the PassManager class are impacted
by this change.
llvm-svn: 194324
Patch by Michele Scandale!
Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.
I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.
Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.
Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).
I finally added a new test case related to the multiplication by -1
issue on GT comparisons.
llvm-svn: 194116
This adds another heuristic to BPI, similar to the existing heuristic that
considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by
Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and
equality comparisons with -1 are also unlikely to succeed. Local
experimentation supports this hypothesis: This yields a 1-2% speedup in the
test-suite sqlite benchmark on the PPC A2 core, with no significant
regressions.
llvm-svn: 193855
We can't do this for the general case as saying a GEP with a negative index
doesn't have unsigned wrap isn't valid for negative indices.
%gep = getelementptr inbounds i32* %p, i64 -1
But an inbounds GEP cannot run past the end of address space. So we check for
the very common case of a positive index and make GEPs derived from that NUW.
Together with Andy's recent non-unit stride work this lets us analyze loops
like
void foo3(int *a, int *b) {
for (; a < b; a++) {}
}
PR12375, PR12376.
Differential Revision: http://llvm-reviews.chandlerc.com/D2033
llvm-svn: 193514
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
llvm-svn: 193438
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
ScalarEvolutionNormalization was attempting to normalize by adding and
subtracting strides. Chained recurrences don't work that way.
llvm-svn: 193437
This fix a memory leak found by valgrind.
Calling it from the base class destructor would not destroy the BasicCallGraph
bits.
FIXME: BasicCallGraph is the only thing that inherits from CallGraph. Can
we merge the two?
llvm-svn: 193412
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
llvm-svn: 193317
Major steps include:
1). introduces a not-addr-taken bit-field in GlobalVariable
2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable
dosen't have its address taken.
3). AA use this info for disambiguation.
llvm-svn: 193251
We can have a struct type with a single field and the field does not start
with 0. In that case, we should correctly update the offset.
llvm-svn: 193137
The test before wasn't successfully testing this
since it was missing the datalayout piece to change
the size of the second address space.
llvm-svn: 193102
SCEV currently fails to compute loop counts for nonunit stride
loops. This comes up frequently. It prevents loop optimization and
forces vectorization to insert extra loop checks.
For example:
void foo(int n, int *x) {
for (int i = 0; i < n; i += 3) {
x[i] = i;
x[i+1] = i+1;
x[i+2] = i+2;
}
}
We need to properly handle the case in which limit > INT_MAX-stride. In
the above case: n > INT_MAX-3. In this case the loop counter will step
beyond the limit and overflow at the same time. However, knowing that
signed integer overlow in undefined, we can assume the loop test
behavior is arbitrary after overflow. This obeys both C undefined
behavior rules, and the more strict LLVM poison value rules.
I'm finally fixing this in response to Hal Finkel's persistence.
The most probable reason that we never optimized this before is that
we were being careful to handle case where the developer expected a
side-effect free infinite loop relying on overflow:
for (int i = 0; i < n; i += s) {
++j;
}
return j;
If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might
expect an infinite loop. However there are plenty of ways to achieve
this effect without relying on undefined behavior of signed overflow.
llvm-svn: 193015
The heuristic was added to avoid spending too much compile time A specially
crafted test case (PR17461, PR16474) with many uses on a select or bitcast
instruction can still trigger the slow case. Add a check for that case.
This only affects compile time, don't have a good way to test it.
llvm-svn: 191896
infrastructure.
This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.
Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.
llvm-svn: 191835
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.
When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.
Existing testing cases are updated to use the struct-path aware TBAA format.
llvm-svn: 191538
This code isn't ready to deal with allocation functions where the return is not
the allocated pointer. The checks below will reject posix_memalign anyways.
llvm-svn: 191319
This is safe per C++11 18.6.1.1p3: [operator new returns] a non-null pointer to
suitably aligned storage (3.7.4), or else throw a bad_alloc exception. This
requirement is binding on a replacement version of this function.
Brings us a tiny bit closer to eliminating more vector push_backs.
llvm-svn: 191310
Overflow doesn't affect the correctness of equalities. Computing this is cheap,
we just reuse the computation for the inbounds case and try to peel of more
non-inbounds GEPs. This pattern is unlikely to ever appear in code generated by
Clang, but SCEV occasionally produces it.
llvm-svn: 191200
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 190542
instead of having its own implementation.
The implementation of isTBAAVtableAccess is in TypeBasedAliasAnalysis.cpp
since it is related to the format of TBAA metadata.
The path for struct-path tbaa will be exercised by
test/Instrumentation/ThreadSanitizer/read_from_global.ll, vptr_read.ll, and
vptr_update.ll when struct-path tbaa is on by default.
llvm-svn: 190216
Revert unintentional commit (of an unreviewed change).
Original commit message:
Add getUnrollingPreferences to TTI
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 189566
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.
llvm-svn: 189565
...so that it can be used for z too. Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.
The pass is opt-in because at the moment it only handles sqrt.
llvm-svn: 189097
This fixes SCEVExpander so that it does not create multiple distinct induction
variables for duplicate PHI entries. Specifically, given some code like this:
do.body6: ; preds = %do.body6, %do.body6, %if.then5
%end.0 = phi i8* [ undef, %if.then5 ], [ %incdec.ptr, %do.body6 ], [ %incdec.ptr, %do.body6 ]
...
Note that it is legal to have multiple entries for a basic block so long as the
associated value is the same. So the above input is okay, but expanding an
AddRec in this loop could produce code like this:
do.body6: ; preds = %do.body6, %do.body6, %if.then5
%indvar = phi i64 [ %indvar.next, %do.body6 ], [ %indvar.next1, %do.body6 ], [ 0, %if.then5 ]
%end.0 = phi i8* [ undef, %if.then5 ], [ %incdec.ptr, %do.body6 ], [ %incdec.ptr, %do.body6 ]
...
%indvar.next = add i64 %indvar, 1
%indvar.next1 = add i64 %indvar, 1
And this is not legal because there are two PHI entries for %do.body6 each with
a distinct value.
Unfortunately, I don't have an in-tree test case.
llvm-svn: 188614
to find loops if the From and To instructions were in the same block.
Refactor the code a little now that we need to fill to start the CFG-walking
algorithm with more than one starting basic block sometimes.
Special thanks to Andrew Trick for catching an error in my understanding of
natural loops in code review.
llvm-svn: 188236
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.
For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).
This will be used by the PowerPC backend in a follow-up commit.
llvm-svn: 187926
This fix is very lightweight. The same fix already existed for AddRec
but was missing for NAry expressions.
This is obviously an improvement and I'm unsure how to test compile
time problems.
Patch by Xiaoyi Guo!
llvm-svn: 187475
Call into ComputeMaskedBits to figure out which bits are set on both add
operands and determine if the value is a power-of-two-or-zero or not.
llvm-svn: 187445
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
The great thing about the SCEVAddRec No-Wrap flag (unlike nsw/nuw) is
that is can be preserved while normalizing (reassociating and
factoring).
The bad thing is that is can't be tranfered back to IR, which is one
of the reasons I don't like the concept of SCEVExpander.
Sorry, I can't think of a direct way to test this, which is why these
were FIXMEs for so long. I just think it's a good time to finally
clean it up.
llvm-svn: 186273
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.
radar://14351991
llvm-svn: 186187
ScalarEvolution::getSignedRange uses ComputeNumSignBits from ValueTracking on
ashr instructions. ComputeNumSignBits can return zero, but this case was not
handled correctly by the code in getSignedRange which was calling:
APInt::getSignedMinValue(BitWidth).ashr(NS - 1)
with NS = 0, resulting in an assertion failure in APInt::ashr.
Now, we just return the conservative result (as with NS == 1).
Another bug found by llvm-stress.
llvm-svn: 185955
(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero
(add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x
llvm-svn: 185954
The symptom is seg-fault, and the root cause is that a SCEV contains a SCEVUnknown
which has null-pointer to a llvm::Value.
This is how the problem take place:
===================================
1). In the pristine input IR, there are two relevant instrutions Op1 and Op2,
Op1's corresponding SCEV (denoted as SCEV(op1)) is a SCEVUnknown, and
SCEV(Op2) contains SCEV(Op1). None of these instructions are dead.
Op1 : V1 = ...
...
Op2 : V2 = ... // directly or indirectly (data-flow) depends on Op1
2) Optimizer (LSR in my case) generates an instruction holding the equivalent
value of Op1, making Op1 dead.
Op1': V1' = ...
Op1: V1 = ... ; now dead)
Op2 : V2 = ... //Now deps on Op1', but the SCEV(Op2) still contains SCEV(Op1)
3) Op1 is deleted, and call-back function is called to reset
SCEV(Op1) to indicate it is invalid. However, SCEV(Op2) is not
invalidated as well.
4) Following pass get the cached, invalid SCEV(Op2), and try to manipulate it,
and cause segfault.
The fix:
========
It seems there is no clean yet inexpensive fix. I write to dev-list
soliciting good solution, unforunately no ack. So, I decide to fix this
problem in a brute-force way:
When ScalarEvolution::getSCEV is called, check if the cached SCEV
contains a invalid SCEVUnknow, if yes, remove the cached SCEV, and
re-evaluate the SCEV from scratch.
I compile buch of big *.c and *.cpp, fortunately, I don't see any increase
in compile time.
Misc:
=====
The reduced test-case has 2357 lines of code+other-stuff, too big to commit.
rdar://14283433
llvm-svn: 185843
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
llvm-svn: 185049
This is a band-aid to fix the most severe regressions we're seeing from basing
spill decisions on block frequencies, until we have a better solution.
llvm-svn: 184835