As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
llvm-svn: 131516
often expressed as "x >= y ? x : y", there is a good chance we can extract
the existing "x >= y" from it and use that as a replacement for "max(x,y)==x".
llvm-svn: 131049
but according to my super-optimizer there are only two missed simplifications
of -instsimplify kind when compiling bzip2, and this is one of them. It amuses
me to have bzip2 be perfectly optimized as far as instsimplify goes!
llvm-svn: 130840
max(a,b) >= a -> true. According to my super-optimizer, these are
by far the most common simplifications (of the -instsimplify kind)
that occur in the testsuite and aren't caught by -std-compile-opts.
llvm-svn: 130780
This obviously helps a lot if the division would be turned into a libcall
(think i64 udiv on i386), but div is also one of the few remaining instructions
on modern CPUs that become more expensive when the bitwidth gets bigger.
This also helps register pressure on i386 when dividing chars, divb needs
two 8-bit parts of a 16 bit register as input where divl uses two registers.
int foo(unsigned char a) { return a/10; }
int bar(unsigned char a, unsigned char b) { return a/b; }
compiles into (x86_64)
_foo:
imull $205, %edi, %eax
shrl $11, %eax
ret
_bar:
movzbl %dil, %eax
divb %sil, %al
movzbl %al, %eax
ret
llvm-svn: 130615
a nice and tidy:
%x1 = load i32* %0, align 4
%1 = icmp eq i32 %x1, 1179403647
br i1 %1, label %if.then, label %if.end
instead of doing lots of loads and branches. May the FreeBSD bootloader
long fit in its allocated space.
llvm-svn: 130416
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type. This eliminates a ton of loads on
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.
This is yet another step along the way towards resolving PR6627.
llvm-svn: 130390
Modified LinearFunctionTestReplace to push the condition on the dead
list instead of eagerly deleting it. This can cause unnecessary
IV rewrites, which should have no effect on codegen and will not be an
issue once we stop generating canonical IVs.
llvm-svn: 130340
1. Only run the early (in the module pass pipe) instcombine/simplifycfg
if the "unit at a time" passes they are cleaning up after runs.
2. Move the "clean up after the unroller" pass to the very end of the
function-level pass pipeline. Loop unroll uses instsimplify now,
so it doesn't create a ton of trash. Moving instcombine later allows
it to clean up after opportunities are exposed by GVN, DSE, etc.
3. Introduce some phase ordering tests for things that are specifically
intended to be simplified by the full optimizer as a whole.
This resolves PR2338, and is progress towards PR6627, which will be
generating code that looks similar to test2.
llvm-svn: 130241
when X has multiple uses. This is useful for exposing secondary optimizations,
but the X86 backend isn't ready for this when X has a single use. For example,
this can disable load folding.
This is inching towards resolving PR6627.
llvm-svn: 130238
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>). When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.
llvm-svn: 129472
reassociation opportunities are exposed. This fixes a bug where
the nested reassociation expects to be the IR to be consistent,
but it isn't, because the outer reassociation has disconnected
some of the operands. rdar://9167457
llvm-svn: 129324
is equivalent to any other relevant value; it isn't true in general.
If it is equivalent, the LoopPromoter will tell the AST the equivalence.
Also, delete the PreheaderLoad if it is unused.
Chris, since you were the last one to make major changes here, can you check
that this is sane?
llvm-svn: 129049
space info. We crash with an assert in this case. This change checks that the
address space of the bitcasted pointer is the same as the gep ptr.
llvm-svn: 128884
after the given instruction; make sure to handle that case correctly.
(It's difficult to trigger; the included testcase involves a dead
block, but I don't think that's a requirement.)
While I'm here, get rid of the unnecessary warning about
SimplifyInstructionsInBlock, since it should work correctly as far as I know.
llvm-svn: 128782
that one of the numbers is signed while the other is unsigned. This could lead
to a wrong result when the signed was promoted to an unsigned int.
* Add the data layout line to the testcase so that it will test the appropriate
thing.
Patch by David Terei!
llvm-svn: 128577
removes one use of X which helps it pass the many hasOneUse() checks.
In my analysis, this turns up very often where X = A >>exact B and that can't be
simplified unless X has one use (except by increasing the lifetime of A which is
generally a performance loss).
llvm-svn: 128373
For example, on 32-bit architecture, don't promote all uses of the IV
to 64-bits just because one use is a 64-bit cast.
Alternate implementation of the patch by Arnaud de Grandmaison.
llvm-svn: 127884
chose is having a non-memcpy/memset use and being larger than any native integer
type. Originally I chose having an access of a size smaller than the total size
of the alloca, but this caused some minor issues on the spirit benchmark where
SRoA runs again after some inlining.
This fixes <rdar://problem/8613163>.
llvm-svn: 127718
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127498
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127459
after it has finished all of its reassociations, because its
habit of unlinking operands and holding them in a datastructure
while working means that it's not easy to determine when an
instruction is really dead until after all its regular work is
done. rdar://9096268.
llvm-svn: 127424
This happens a lot in clang-compiled C++ code because it adds overflow checks to operator new[]:
unsigned *foo(unsigned n) { return new unsigned[n]; }
We can optimize away the overflow check on 64 bit targets because (uint64_t)n*4 cannot overflow.
llvm-svn: 127418
gave up when I realized I couldn't come up with a good name for what the
refactored function would be, to describe what it does.
This is PR9343 test12, which is test3 with arguments reordered. Whoops!
llvm-svn: 127318
a union of a float, <2 x float>, and <4 x float>. This mostly comes up with the
use of vector intrinsics, especially in NEON when programmers know the layout of
the register file. This enables codegen to eliminate a lot of the subregister
traffic it would otherwise generate.
This commit only enables this for a small number of floating-point cases, but a
lot more integer cases. I assume this is okay for all ports, but I did not do
extensive testing of the quality of code involving i512 vectors and the like. If
there is a use case where this generates worse code than before, let me know and
we can scale it back.
This fixes <rdar://problem/9036264>.
llvm-svn: 127317
reachable uses, but there still might be uses in dead blocks. Use the
standard solution of replacing all the uses with undef. This is
a rare case because it's very sensitive to phase ordering in SimplifyCFG.
llvm-svn: 127299
the value splatted into every element. Extend this to getTrue and getFalse which
by providing new overloads that take Types that are either i1 or <N x i1>. Use
it in InstCombine to add vector support to some code, fixing PR8469!
llvm-svn: 127116
possible. This goes into instcombine and instsimplify because instsimplify
doesn't need to check hasOneUse since it returns (almost exclusively) constants.
This fixes PR9343 #4#5 and #8!
llvm-svn: 127064
"icmp pred %X, CI" and a number of examples where "%X = binop %Y, CI2".
Some of these cases (div and rem) used to make it through opt -O2, but the
others are probably now making code elsewhere redundant (probably instcombine).
llvm-svn: 126988
intersection of the LHS and RHS ConstantRanges and return "false" when
the range is empty.
This simplifies some code and catches some extra cases.
llvm-svn: 126744
Yes, there are other types than i8* and GEPs on them can produce an add+multiply.
We don't consider that cheap enough to be speculatively executed.
llvm-svn: 126481
function prototype into a call to a varargs prototype. We do
allow the xform if we have a definition, but otherwise we don't
want to risk that we're changing the abi in a subtle way. On
X86-64, for example, varargs require passing stuff in %al.
llvm-svn: 126363
We usually catch this kind of optimization through InstSimplify's distributive
magic, but or doesn't distribute over xor in general.
"A | ~(A | B) -> A | ~B" hits 24 times on gcc.c.
llvm-svn: 126081
constant, including globals. This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.
This enables us to handle stores of relocatable globals. This kicks in about
48 times in 254.gap, giving us stuff like this:
@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16
...
call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind
llvm-svn: 126044
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
taken (and used!). This prevents merging the blocks (invalidating
the block addresses) in a case like this:
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
void foo() {
printf("%p\n", _THIS_IP_);
printf("%p\n", _THIS_IP_);
printf("%p\n", _THIS_IP_);
}
which fixes PR4151.
llvm-svn: 125829
plus some variations of this. According to my auto-simplifier this occurs a lot
but usually in combination with max/min idioms. Because max/min aren't handled
yet this unfortunately doesn't have much effect in the testsuite.
llvm-svn: 125462
It caused a crash in MultiSource/Benchmarks/Bullet.
Opt hit an assertion with "opt -std-compile-opts" because
Constant::getAllOnesValue doesn't know how to handle floats.
This patch added a test to reproduce the problem and a check that the
destination vector is of integer type.
Thank you Benjamin!
llvm-svn: 125459
gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
llvm-svn: 125271
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
llvm-svn: 125267
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
llvm-svn: 125266
could end up removing a different function than we intended because it was
functionally equivalent, then end up with a comparison of a function against
itself in the next round of comparisons (the one in the function set and the
one on the deferred list). To fix this, I introduce a choice in the form of
comparison for ComparableFunctions, either normal or "pointer only" used to
find exact Function*'s in lookups.
Also add some debugging statements.
llvm-svn: 125180
auto-simplifier). This has a big impact on Ada code, but not much else.
Unfortunately the impact is mostly negative! This is due to PR9004 (aka
SCCP failing to resolve conditional branch conditions in the destination
blocks of the branch), in which simple correlated expressions are not
resolved but complicated ones are, so simplifying has a bad effect!
llvm-svn: 124788
overflow (nsw flag), which was disabled because it breaks 254.gap. I have
informed the GAP authors of the mistake in their code, and arranged for the
testsuite to use -fwrapv when compiling this benchmark.
llvm-svn: 124746
This makes the job of the later optzn passes easier, allowing the vast amount of
icmp transforms to chew on it.
We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting
binary on i386-linux.
The testcase from README.txt now compiles into
decl %edi
cmpl $3, %edi
sbbl %eax, %eax
andl $1, %eax
ret
llvm-svn: 124724
to do this and more, but would only do it if X/Y had only one use. Spotted as the
most common missed simplification in SPEC by my auto-simplifier, now that it knows
about nuw/nsw/exact flags. This removes a bunch of multiplications from 447.dealII
and 483.xalancbmk. It also removes a lot from tramp3d-v4, which results in much
more inlining.
llvm-svn: 124560
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.
llvm-svn: 124287
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
llvm-svn: 124183
occurs because instcombine sinks loads and inserts phis. This kicks in
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.
This resolves the last of rdar://7339113.
llvm-svn: 124090
common cases. This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this. For example, code from the
STL ends up looking like this to SRoA:
%202 = load i64* %__old_size, align 8, !tbaa !3
%203 = load i64* %__old_size, align 8, !tbaa !3
%204 = load i64* %__n, align 8, !tbaa !3
%205 = icmp ult i64 %203, %204
%storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
%206 = load i64* %storemerge.i, align 8, !tbaa !3
We can now promote both the __n and the __old_size allocas.
This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.
llvm-svn: 124088
that have PHI or select uses of their element pointers. This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.
With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted. This is still a win for large aggregates where only one element
is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).
llvm-svn: 124070
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.
llvm-svn: 123963
While here, I'd like to complain about how vector is not an aggregate type
according to llvm::Type::isAggregateType(), but they're listed under aggregate
types in the LangRef and zero vectors are stored as ConstantAggregateZero.
llvm-svn: 123956
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this). At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite. I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs. One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.
llvm-svn: 123911
These were not recommended by my auto-simplifier since they don't fire often enough.
However they do fire from time to time, for example they remove one subtraction from
the final bitcode for 483.xalancbmk.
llvm-svn: 123755
simplification in fully optimized code. It occurs sporadically in the testsuite, and
many times in 403.gcc: the final bitcode has 131 fewer subtractions after this change.
The reason that the multiplies are not eliminated is the same reason that instcombine
did not catch this: they are used by other instructions (instcombine catches this with
a more general transform which in general is only profitable if the operands have only
one use).
llvm-svn: 123754
This fixes the original testcase in PR8927. It also causes a clang
binary built with a patched clang to increase in size by 0.21%.
We can probably get some of the size back by writing a pass that
detects that a global never has its pointer compared and adds
unnamed_addr to it (maybe extend global opt). It is also possible that
there are some other cases clang could add unnamed_addr to.
I will investigate extending globalopt next.
llvm-svn: 123584
then don't try to decimate it into its individual pieces. This will just make a mess of the
IR and is pointless if none of the elements are individually accessed. This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.
The testcase now is optimized to:
define i64 @test2(i64 %X) {
br label %L2
L2: ; preds = %0
ret i64 %X
}
before we generated:
define i64 @test2(i64 %X) {
%sroa.store.elt = lshr i64 %X, 56
%1 = trunc i64 %sroa.store.elt to i8
%sroa.store.elt8 = lshr i64 %X, 48
%2 = trunc i64 %sroa.store.elt8 to i8
%sroa.store.elt9 = lshr i64 %X, 40
%3 = trunc i64 %sroa.store.elt9 to i8
%sroa.store.elt10 = lshr i64 %X, 32
%4 = trunc i64 %sroa.store.elt10 to i8
%sroa.store.elt11 = lshr i64 %X, 24
%5 = trunc i64 %sroa.store.elt11 to i8
%sroa.store.elt12 = lshr i64 %X, 16
%6 = trunc i64 %sroa.store.elt12 to i8
%sroa.store.elt13 = lshr i64 %X, 8
%7 = trunc i64 %sroa.store.elt13 to i8
%8 = trunc i64 %X to i8
br label %L2
L2: ; preds = %0
%9 = zext i8 %1 to i64
%10 = shl i64 %9, 56
%11 = zext i8 %2 to i64
%12 = shl i64 %11, 48
%13 = or i64 %12, %10
%14 = zext i8 %3 to i64
%15 = shl i64 %14, 40
%16 = or i64 %15, %13
%17 = zext i8 %4 to i64
%18 = shl i64 %17, 32
%19 = or i64 %18, %16
%20 = zext i8 %5 to i64
%21 = shl i64 %20, 24
%22 = or i64 %21, %19
%23 = zext i8 %6 to i64
%24 = shl i64 %23, 16
%25 = or i64 %24, %22
%26 = zext i8 %7 to i64
%27 = shl i64 %26, 8
%28 = or i64 %27, %25
%29 = zext i8 %8 to i64
%30 = or i64 %29, %28
ret i64 %30
}
In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off. It's better to not generate this stuff
in the first place.
llvm-svn: 123571
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
llvm-svn: 123568
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
llvm-svn: 123526
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
llvm-svn: 123524
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
llvm-svn: 123442
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
llvm-svn: 123441
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
llvm-svn: 123417
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.
llvm-svn: 123381
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.
llvm-svn: 123380
is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way
llvm-gcc converts gcc's conditional expressions. Add this, and a few other
similar transforms for completeness.
llvm-svn: 123372
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.
llvm-svn: 123206
larger memsets. Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:
_mad_synth_mute: ## @mad_synth_mute
## BB#0: ## %entry
pushq %rax
movl $4096, %esi ## imm = 0x1000
callq ___bzero
popq %rax
ret
llvm-svn: 123089
1. Rip out LoopRotate's domfrontier updating code. It isn't
needed now that LICM doesn't use DF and it is super complex
and gross.
2. Make DomTree updating code a lot simpler and faster. The
old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
SplitCriticalEdge instead of doing an overcomplex
reimplementation of it.
No behavior change, except for the name of the inserted preheader.
llvm-svn: 123072
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.
Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.
llvm-svn: 123060
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
llvm-svn: 123034
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)
to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.
llvm-svn: 122950
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
llvm-svn: 122806
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
llvm-svn: 122712
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
llvm-svn: 122695
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
llvm-svn: 122685
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
llvm-svn: 122654
the original instruction, half the cases were missed (making it not
wrong but suboptimal). Also correct a typo (A <-> B) in the second
chunk.
llvm-svn: 122414
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
llvm-svn: 122399
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
llvm-svn: 122378
a couple of existing transforms. This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.
llvm-svn: 122326
(they had just been forgotten before). Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.
llvm-svn: 122245
argument. The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.
llvm-svn: 122234
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
llvm-svn: 122182
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:
unsigned char X(char a, char b) {
int res = a+b;
if ((unsigned )(res+128) > 255U)
abort();
return res;
}
llvm-svn: 122178
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.
This is likely to be the fix for 8816.
llvm-svn: 122177
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.
llvm-svn: 122164
on the DragonEgg self-host bot. Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.
llvm-svn: 122072
a null endptr argument, because they may write to errno.
This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.
llvm-svn: 122014
dragonegg self-host buildbot. Original commit message:
Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
llvm-svn: 121965
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
Fixes <rdar://problem/8558713>.
llvm-svn: 121905
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.
llvm-svn: 121859
which is simpler than finding a place to insert in BB.
- Don't perform the 'if condition hoisting' xform on certain
i1 PHIs, as it interferes with switch formation.
This re-fixes "example 7", without breaking the world hopefully.
llvm-svn: 121764
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.
second, it doesn't clean up the "if diamond" that it just
eliminated away. This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.
llvm-svn: 121762
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
llvm-svn: 121690
(x & 2^n) ? 2^m+C : C
we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.
llvm-svn: 121609
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.
llvm-svn: 121111
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
llvm-svn: 120974
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
llvm-svn: 120406
contains "ref".
Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.
llvm-svn: 120368
fairly systematic way in instcombine. Some of these cases were already dealt
with, in which case I removed the existing code. The case of Add has a bunch of
funky logic which covers some of this plus a few variants (considers shifts to be
a form of multiplication), which I didn't touch. The simplification performed is:
A*B+A*C -> A*(B+C). The improvement is to do this in cases that were not already
handled [such as A*B-A*C -> A*(B-C), which was reported on the mailing list], and
also to do it more often by not checking for "only one use" if "B+C" simplifies.
llvm-svn: 120024
folding improvements: if P points to a type of size zero, turn "gep P, N" into "P".
More generally, if a gep index type has size zero, instcombine could replace the
index with zero, but that is not done here.
llvm-svn: 119942
allowing the memcpy to be eliminated.
Unfortunately, the requirements on byval's without explicit
alignment are really weak and impossible to predict in the
mid-level optimizer, so this doesn't kick in much with current
frontends. The fix is to change clang to set alignment on all
byval arguments.
llvm-svn: 119916
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed. This led to really bad performance on tall, narrow CFGs.
We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree. Because there are typically few duplicates of a given value, this scan tends to be
quite fast. Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.
This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.
The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack. If you see conditional propagation that's not happening, please file bugs against LVI or CVP.
llvm-svn: 119714
refusing to optimize two memcpy's like this:
copy A <- B
copy C <- A
if it couldn't prove that noalias(B,C). We can eliminate
the copy by producing a memmove instead of memcpy.
llvm-svn: 119694
if it is passed as a byval argument. The byval argument will just be a
read, so it is safe to read from the original global instead. This allows
us to promote away the %agg.tmp alloca in PR8582
llvm-svn: 119686
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this). So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.
llvm-svn: 119347
offload the work to hasConstantValue rather than do something more
complicated (such handling mutually recursive phis) because (1) it is
not clear it is worth it; and (2) if it is worth it, maybe such logic
would be better placed in hasConstantValue. Adjust some GVN tests
which are now cleaned up much further (eg: all phi nodes are removed).
llvm-svn: 119043
SimplifyAssociativeOrCommutative) "(A op C1) op C2" -> "A op (C1 op C2)",
which previously was only done if C1 and C2 were constants, to occur whenever
"C1 op C2" simplifies (a la InstructionSimplify). Since the simplifying operand
combination can no longer be assumed to be the right-hand terms, consider all of
the possible permutations. When compiling "gcc as one big file", transform 2
(i.e. using right-hand operands) fires about 4000 times but it has to be said
that most of the time the simplifying operands are both constants. Transforms
3, 4 and 5 each fired once. Transform 6, which is an existing transform that
I didn't change, never fired. With this change, the testcase is now optimized
perfectly with one run of instcombine (previously it required instcombine +
reassociate + instcombine, and it may just have been luck that this worked).
llvm-svn: 119002
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer. Teach it how to reason about GEPs
with simple non-zero indices.
Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.
llvm-svn: 118840
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
nodes can be used in loops, this could result in infinite looping
if there is no recursion limit, so add such a limit. It is also
used for the SelectInst case because in theory there could be an
infinite loop there too if the basic block is unreachable.
llvm-svn: 118694
to optionally look for constant or local (alloca) memory.
Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.
Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.
llvm-svn: 118412
of a select instruction, see if doing the compare with the
true and false values of the select gives the same result.
If so, that can be used as the value of the comparison.
llvm-svn: 118378
consider it to be readonly. In fact, don't even consider it to be
readonly if it does a volatile load from an AllocaInst either (it
is debatable as to whether readonly would be correct or not in this
case; play safe for the moment). This fixes PR8279.
llvm-svn: 117783
This code had previously used 2*N, where N is the mask length, to represent
undef. That is not safe because the shufflevector operands may have more
than N elements -- they don't have to match the result type.
llvm-svn: 117721
Allow splats even if they don't match either of the original shuffles,
possibly due to undef entries in the shuffles masks. Radar 8597790.
Also fix some 80-column violations.
llvm-svn: 117719
it isn't unreachable and should not be zapped. The check for the entry block
was missing in one case: a block containing a unwind instruction. While there,
do some small cleanups: "M" is not a great name for a Function* (it would be
more appropriate for a Module*), change it to "Fn"; use Fn in more places.
llvm-svn: 117224
does normal initialization and normal chaining. Change the default
AliasAnalysis implementation to NoAlias.
Update StandardCompileOpts.h and friends to explicitly request
BasicAliasAnalysis.
Update tests to explicitly request -basicaa.
llvm-svn: 116720
logic to use the new APInt methods. Among other things this
implements rdar://8501501 - llvm.smul.with.overflow.i32 should constant fold
which comes from "clang -ftrapv", originally brought to my attention from PR8221.
llvm-svn: 116457
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.
llvm-svn: 115337
Because of this, we cannot use the Simplify* APIs, as they can assert-fail on unreachable code. Since it's not easy to determine
if a given threading will cause a block to become unreachable, simply defer simplifying simplification to later InstCombine and/or
DCE passes.
llvm-svn: 115082
Usually we wouldn't do this anyway because llvm_fenv_testexcept would return an
exception, but we have seen some cases where neither errno nor fenv detect an
exception on arm-linux.
llvm-svn: 114893
Splitting critical edges at the merge point only addressed part of the issue; it is also possible for non-post-domination
to occur when the path from the load to the merge has branches in it. Unfortunately, full anticipation analysis is
time-consuming, so for now approximate it. This is strictly more conservative than real anticipation, so we will miss
some cases that real PRE would allow, but we also no longer insert loads into paths where they didn't exist before. :-)
This is a very slight net positive on SPEC for me (0.5% on average). Most of the benchmarks are largely unaffected, but
when it pays off it pays off decently: 181.mcf improves by 4.5% on my machine.
llvm-svn: 114785
a Constant into a ConstantRange. Handle this conservatively for now, rather than asserting. The testcase is
more complex that I would like, but the manifestation of the problem is sensitive to iteration orders and the state of the
LVI cache, and I have not been able to reproduce it with manually constructed or simplified cases.
Fixes PR8162.
llvm-svn: 114103
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
llvm-svn: 113763
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
llvm-svn: 113679
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
can't handle. Since it already handles all the cases without other instructions in the def-use chain
between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
as well.
llvm-svn: 112611
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
from the LHS should disable reconsidering that pred on the
RHS. However, knowing something about the pred on the RHS
shouldn't disable subsequent additions on the RHS from
happening.
llvm-svn: 111349
- Eliminate redundant successors.
- Convert an indirectbr with one successor into a direct branch.
Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.
llvm-svn: 111060
it inserted rather than using LoopInfo::getCanonicalInductionVariable to
rediscover it, since that doesn't work on non-canonical loops. This fixes
infinite recurrsion on such loops; PR7562.
llvm-svn: 109419
mutated by recursive simplification. This also enhances
ReplaceAndSimplifyAllUses to actually do a real RAUW
at the end of it, which updates any value handles
pointing to "From" to start pointing to "To". This
seems useful for debug info and random other VH users.
llvm-svn: 108415
by a return that returns a constant, while elsewhere in the function
another return instruction returns a different constant. This is a
special case of accumulator recursion, so just generalize the existing
logic a bit.
llvm-svn: 108241
(X >s -1) ? C1 : C2 and (X <s 0) ? C2 : C1
into ((X >>s 31) & (C2 - C1)) + C1, avoiding the conditional.
This optimization could be extended to take non-const C1 and C2 but we better
stay conservative to avoid code size bloat for now.
for
int sel(int n) {
return n >= 0 ? 60 : 100;
}
we now generate
sarl $31, %edi
andl $40, %edi
leal 60(%rdi), %eax
instead of
testl %edi, %edi
movl $60, %ecx
movl $100, %eax
cmovnsl %ecx, %eax
llvm-svn: 107866
such a way that debug info for symbols preserved even if symbols are
optimized away by the optimizer.
Add new special pass to remove debug info for such symbols.
llvm-svn: 107416
the returned value after the tail call if it differs from other return
values. The optimal thing to do would be to introduce a phi node for
the return value, but for the moment just fix the miscompile.
llvm-svn: 106947
when it detects undefined behavior. llvm.trap generally codegens into some
thing really small (e.g. a 2 byte ud2 instruction on x86) and debugging this
sort of thing is "nontrivial". For example, we now compile:
void foo() { *(int*)0 = 42; }
into:
_foo:
pushl %ebp
movl %esp, %ebp
ud2
Some may even claim that this is a security hole, though that seems dubious
to me. This addresses rdar://7958343 - Optimizing away null dereference
potentially allows arbitrary code execution
llvm-svn: 103356
with a vector input and output into a shuffle vector. This sort of
sequence happens when the input code stores with one type and reloads
with another type and then SROA promotes to i96 integers, which make
everyone sad.
This fixes rdar://7896024
llvm-svn: 103354
values passed to llvm.dbg.value were not valid for the intrinsic, it
might have caused trouble one day if the verifier ever started checking
for valid debug info.
llvm-svn: 103038
RAUW of a global variable with a local variable in function F,
if function local metadata M in function G was using the global
then M would become function-local to both F and G, which is not
allowed. See the testcase for an example. Fixed by detecting
this situation and zapping the metadata operand when it occurs.
llvm-svn: 103007
halting analysis, it is illegal to delete a call to a read-only function.
The correct solution is almost certainly to add a "must halt" attribute and
only allow deletions in its presence.
XFAIL the relevant testcase for now.
llvm-svn: 102831
that can have a big effect :). The first is to enable the
iterative SCC passmanager juice that kicks in when the
scc passmgr detects that a function pass has devirtualized
a call. In this case, it will rerun all the passes it
manages on the SCC, up to the iteration count limit (4). This
is useful because a function pass may devirualize a call, and
we want the inliner to inline it, or pruneeh to infer stuff
about it, etc.
The second patch is to add *all* call sites to the
DevirtualizedCalls list the inliner uses. This list is
about to get renamed, but the jist of this is that the
inliner now reconsiders *all* inlined call sites as candidates
for further inlining. The intuition is this that in cases
like this:
f() { g(1); } g(int x) { h(x); }
We analyze this bottom up, and may decide that it isn't
profitable to inline H into G. Next step, we decide that it is
profitable to inline G into F, and do so, which means that F
now calls H. Even though the call from G -> H may not have been
profitable to inline, the call from F -> H may be (in this case
because a constant allows folding etc).
In my spot checks, this doesn't have a big impact on code. For
example, the LLC output for 252.eon grew from 0.02% (from
317252 to 317308) and 176.gcc actually shrunk by .3% (from 1525612
to 1520964 bytes). 252.eon never iterated in the SCC Passmgr,
176.gcc iterated at most 1 time.
llvm-svn: 102823
that appear due to inlining a callee as candidates for
futher inlining, but a recent patch made it do this if
those call sites were indirect and became direct.
Unfortunately, in bizarre cases (see testcase) doing this
can cause us to infinitely inline mutually recursive
functions into callers not in the cycle. Fix this by
keeping track of the inline history from which callsite
inline candidates got inlined from.
This shouldn't affect any "real world" code, but is required
for a follow on patch that is coming up next.
llvm-svn: 102822
were still inlining self-recursive functions into other functions.
Inlining a recursive function into itself has the potential to
reduce recursion depth by a factor of 2, inlining a recursive
function into something else reduces recursion depth by exactly
1. Since inlining a recursive function into something else is a
weird form of loop peeling, turn this off.
The deleted testcase was added by Dale in r62107, since then
we're leaning towards not inlining recursive stuff ever. In any
case, if we like inlining recursive stuff, it should be done
within the recursive function itself to get the algorithm
recursion depth win.
llvm-svn: 102798
that appear in the SCC as a result of inlining as candidates
for inlining. Change this so that it *does* consider call
sites that change from being indirect to being direct as a
result of inlining. This allows it to completely
"devirtualize" the testcase.
llvm-svn: 102146
Fix RefreshCallGraph to use CGN->replaceCallEdge instead of hand
rolling its own loop. replaceCallEdge properly maintains the
reference counts of the nodes, fixing a crash exposed by the
iterative callgraph stuff.
llvm-svn: 102120
we have RefreshCallGraph detect when a function pass devirtualizes
a call, and have CGSCCPassMgr iterate (up to a count) when this
happens. This allows (in the example) GVN to devirtualize the
call in foo, then the inliner to inline it away.
This is not currently enabled because I haven't done any analysis
on the (potentially substantial) code size or performance impact of
doing this, and guess what, it exposes callgraph updating bugs in
various passes. This is progress though, and you can play with it
by passing -max-cg-scc-iterations=5 to opt.
llvm-svn: 101973
recursive callsites, inlining can reduce the number of calls by
exponential factors, as it does in
MultiSource/Benchmarks/Olden/treeadd. More involved heuristics
will be needed.
llvm-svn: 101969
condition we're unswitching on. In this case, don't try to
simplify the second copy of the loop which may be dead or not,
but is probably a constant now. This fixes PR6879
llvm-svn: 101870
Arg promotion was deleting call graph nodes that still had references
from the 'indirect' CGN. Like the inliner, it should only delete the
function if all references are gone.
llvm-svn: 101845
just ask ScalarEvolution for it on demand. This helps IVUsers be more robust
in the case of expressions changing underneath it. This fixes PR6862.
llvm-svn: 101819
to determine where to place PHIs by iteratively comparing reaching definitions
at each block. That was just plain wrong. This version now computes the
dominator tree within the subset of the CFG where PHIs may need to be placed,
and then places the PHIs in the iterated dominance frontier of each definition.
The rest of the patch is mostly the same, with a few more performance
improvements added in.
llvm-svn: 101612
numerator is an induction variable. For example, with code like this:
for (i=0;i<n;++i)
x[i%n] = 0;
IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the remainder.
llvm-svn: 101113
expression is a UDiv and it doesn't appear that the UDiv came from
the user's source.
ScalarEvolution has recently figured out how to compute a tripcount
expression for the inner loop in
SingleSource/Benchmarks/Shootout/sieve.c, using a udiv. Emitting a
udiv instruction dramatically slows down the enclosing loop.
llvm-svn: 101068
the loop exit test. This usually doesn't come up for a variety of
reasons, but it isn't impossible, so make IndVarSimplify handle it
conservatively.
llvm-svn: 101008
variables. For example, with code like this:
for (i=0;i<n;++i)
if (i<n)
x[i] = 0;
IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the if.
llvm-svn: 101000
into adjacent loops. Also, ensure that the insert position is
dominated by the loop latch of any loop in the post-inc set which
has a latch.
llvm-svn: 100906
forced constant is changed to a constant, we would end
up adding the instruction to the wrong worklist,
preventing it from being properly revisited. This fixes
rdar://7832370
llvm-svn: 100837
explicitly split into stride-and-offset pairs. Also, add the
ability to track multiple post-increment loops on the same expression.
This refines the concept of "normalizing" SCEV expressions used for
to post-increment uses, and introduces a dedicated utility routine for
normalizing and denormalizing expressions.
This fixes the expansion of expressions which are post-increment users
of more than one loop at a time. More broadly, this takes LSR another
step closer to being able to reason about more than one loop at a time.
llvm-svn: 100699
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100304
checker. Amusingly, we already had tests that we should
have rejects because they would be miscompiled in the
testsuite.
The remaining issue with this is that we don't check that
the branch causes us to exit the loop if it fails, so we
don't actually know if we remain in bounds.
llvm-svn: 100284
this cleans up a bunch of code and also fixes several crashes and
miscompiles. More to come unfortunately, this optimization
is quite broken.
llvm-svn: 100270
(what was I thinking?) and there's also a problem with LCSSA. I'll try again
later with fixes.
--- Reverse-merging r100263 into '.':
U lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100177 into '.':
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100148 into '.':
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100147 into '.':
U include/llvm/Transforms/Utils/SSAUpdater.h
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100131 into '.':
G include/llvm/Transforms/Utils/SSAUpdater.h
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100130 into '.':
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100126 into '.':
G include/llvm/Transforms/Utils/SSAUpdater.h
G lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100050 into '.':
D test/Transforms/GVN/2010-03-31-RedundantPHIs.ll
--- Reverse-merging r100047 into '.':
G include/llvm/Transforms/Utils/SSAUpdater.h
G lib/Transforms/Utils/SSAUpdater.cpp
llvm-svn: 100264
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100191
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
A update of langref will occur in a subsequent checkin.
llvm-svn: 99928
for the noinline attribute, and make the inliner refuse to
inline a call site when the call site is marked noinline even
if the callee isn't. This fixes PR6682.
llvm-svn: 99341
out the remainder of the calls that we should lower in some way and
move the tests to the new correct directory. Fix up tests that are now
optimized more than they were before by -instcombine.
llvm-svn: 97875
Log:
Transform @llvm.objectsize to integer if the argument is a result of malloc of known size.
Modified:
llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
llvm/trunk/test/Transforms/InstCombine/objsize.ll
It appears to be causing swb and nightly test failures.
llvm-svn: 97866
parts of the cmp|cmp and cmp&cmp folding logic wasn't prepared for vectors
(unrelated to the bug but noticed while in the code) and the code was
*definitely* not safe to use by the (cast icmp)|(cast icmp) handling logic
that I added in r95855. Fix all this up by changing the various routines
to more consistently use IRBuilder and not pass in the I which had the wrong
type.
llvm-svn: 97801
transformation much more careful. Truncating binary '01' to '1' sounds like it's
safe until you realize that it switched from positive to negative under a signed
interpretation, and that depends on the icmp predicate.
Also a few miscellaneous cleanups.
llvm-svn: 97721
long test(long x) { return (x & 123124) | 3; }
Currently compiles to:
_test:
orl $3, %edi
movq %rdi, %rax
andq $123127, %rax
ret
This is because instruction and DAG combiners canonicalize
(or (and x, C), D) -> (and (or, D), (C | D))
However, this is only profitable if (C & D) != 0. It gets in the way of the
3-addressification because the input bits are known to be zero.
llvm-svn: 97616
payloads. APFloat's internal folding routines always make QNaNs now,
instead of sometimes making QNaNs and sometimes SNaNs depending on the
type.
llvm-svn: 97364
which branch on undef to branch on a boolean constant for the edge
exiting the loop. This helps ScalarEvolution compute trip counts for
loops.
Teach ScalarEvolution to recognize single-value PHIs, when safe, and
ForgetSymbolicName to forget such single-value PHI nodes as apprpriate
in ForgetSymbolicName.
llvm-svn: 97126
getelementptr. Despite only doing so in the case where x is a known
array object and c can be converted to an index within range, this
could still be invalid if c is actually the address of an object
allocated outside of LLVM. Also, SCEVExpander, the original motivation
for this code, has since been improved to avoid inttoptr+ptroint in
more cases.
llvm-svn: 96950
true or false as its exit condition. These are usually eliminated by
SimplifyCFG, but the may be left around during a pass which wishes
to preserve the CFG.
llvm-svn: 96683
2. don't bother trying to merge globals in non-default sections,
doing so is quite dubious at best anyway.
3. fix a bug reported by Arnaud de Grandmaison where we'd try to
merge two globals in different address spaces.
llvm-svn: 95995
bug fixes, and with improved heuristics for analyzing foreign-loop
addrecs.
This change also flattens IVUsers, eliminating the stride-oriented
groupings, which makes it easier to work with.
llvm-svn: 95975
what it does. Enhance it to return false to optimizing vector
sign extensions from vector comparisions, which is the idiom used
to get a splatted vector for a vector comparison.
Doing this breaks vector-casts.ll, add some compensating
transformations to handle the important case they cover without
depending on this canonicalization.
This fixes rdar://7434900 a serious pessimization of vector compares.
llvm-svn: 95855
block. Other blocks may have pointer cycles that will crash
basicaa and other alias analyses. In any case, there is no
point wasting cycles optimizing dead blocks. This fixes
rdar://7635088
llvm-svn: 95852
Initial skeleton and SCEVUnknown lowering implemented,
the rest should come relatively quickly. Move testcase
to new directory.
Move pass to right before SimplifyLibCalls - which is
moved down a bit so we can take advantage of a few opts.
llvm-svn: 95628
xform it is checking to actually pass. There is no need to match
m_SelectCst<0, -1> since instcombine canonicalizes that into not(sext).
Add matches for sext(not(x)) in addition to not(sext(x)).
llvm-svn: 95420
Fix bugs where we would compute out of bounds as in bounds, and where
we couldn't know that the linker could override the size of an array.
Add a few new testcases, change existing testcase to use a private
global array instead of extern.
llvm-svn: 95283