SimplifyDemandedBits. The idea is that some operations can be simplified if
not all of the computed elements are needed. Some targets (like x86) have a
large number of intrinsics that operate on a single element, but pass other
elts through unmodified. If those other elements are not needed, the
intrinsics can be simplified to scalar operations, and insertelement ops can
be removed.
This turns (f.e.):
ushort %Convert_sse(float %f) {
%tmp = insertelement <4 x float> undef, float %f, uint 0 ; <<4 x float>> [#uses=1]
%tmp10 = insertelement <4 x float> %tmp, float 0.000000e+00, uint 1 ; <<4 x float>> [#uses=1]
%tmp11 = insertelement <4 x float> %tmp10, float 0.000000e+00, uint 2 ; <<4 x float>> [#uses=1]
%tmp12 = insertelement <4 x float> %tmp11, float 0.000000e+00, uint 3 ; <<4 x float>> [#uses=1]
%tmp28 = tail call <4 x float> %llvm.x86.sse.sub.ss( <4 x float> %tmp12, <4 x float> < float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp37 = tail call <4 x float> %llvm.x86.sse.mul.ss( <4 x float> %tmp28, <4 x float> < float 5.000000e-01, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp37, <4 x float> < float 6.553500e+04, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> zeroinitializer ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
into:
ushort %Convert_sse(float %f) {
entry:
%tmp28 = sub float %f, 1.000000e+00 ; <float> [#uses=1]
%tmp37 = mul float %tmp28, 5.000000e-01 ; <float> [#uses=1]
%tmp375 = insertelement <4 x float> undef, float %tmp37, uint 0 ; <<4 x float>> [#uses=1]
%tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp375, <4 x float> < float 6.553500e+04, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> < float 0.000000e+00, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1]
%tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1]
%tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1]
ret ushort %tmp69
}
which improves codegen from:
_Convert_sse:
movss LCPI1_0, %xmm0
movss 4(%esp), %xmm1
subss %xmm0, %xmm1
movss LCPI1_1, %xmm0
mulss %xmm0, %xmm1
movss LCPI1_2, %xmm0
minss %xmm0, %xmm1
xorps %xmm0, %xmm0
maxss %xmm0, %xmm1
cvttss2si %xmm1, %eax
andl $65535, %eax
ret
to:
_Convert_sse:
movss 4(%esp), %xmm0
subss LCPI1_0, %xmm0
mulss LCPI1_1, %xmm0
movss LCPI1_2, %xmm1
minss %xmm1, %xmm0
xorps %xmm1, %xmm1
maxss %xmm1, %xmm0
cvttss2si %xmm0, %eax
andl $65535, %eax
ret
This is just a first step, it can be extended in many ways. Testcase here:
Transforms/InstCombine/vec_demanded_elts.ll
llvm-svn: 30752
Remove the Function pointer cast in these calls, converting it to
a cast of argument.
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( int 10 )
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( uint %tmp51 )
llvm-svn: 28953
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.
This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.
llvm-svn: 28224
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
blocks (due to constants in conditional branches/switches) to the worklist.
This causes them to be deleted before instcombine starts up, leading to
better optimization.
2. In the prepass over instructions, do trivial constprop/dce as we go. This
has the effect of improving the effectiveness of #1. In addition, it
*significantly* speeds up instcombine on test cases with large amounts of
constant folding code (for example, that produced by code specialization
or partial evaluation). In one example, it speeds up instcombine from
0.0589s to 0.0224s with a release build (a 2.6x speedup).
llvm-svn: 28215
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated. If one or
both doesn't, then this xform doesn't remove a cast.
This fixes Transforms/InstCombine/2006-05-06-Infloop.ll
llvm-svn: 28141
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
%tmp = cast <4 x int> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
into:
%tmp = cast <4 x uint> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
llvm-svn: 27355
%tmp = cast <4 x uint>* %testData to <4 x int>* ; <<4 x int>*> [#uses=1]
%tmp = load <4 x int>* %tmp ; <<4 x int>> [#uses=1]
to this:
%tmp = load <4 x uint>* %testData ; <<4 x uint>> [#uses=1]
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
llvm-svn: 27353
the pointer is known to come from either a global variable, alloca or
malloc. This allows us to compile this:
P = malloc(28);
memset(P, 0, 28);
into explicit stores on PPC instead of a memset call.
llvm-svn: 26577
Make this code more powerful by using ComputeMaskedBits instead of looking
for an AND operand. This lets us fold this:
int %test23(int %a) {
%tmp.1 = and int %a, 1
%tmp.2 = seteq int %tmp.1, 0
%tmp.3 = cast bool %tmp.2 to int ;; xor tmp1, 1
ret int %tmp.3
}
into: xor (and a, 1), 1
llvm-svn: 26396
1. Teach GetConstantInType to handle boolean constants.
2. Teach instcombine to fold (compare X, CST) when X has known 0/1 bits.
Testcase here: set.ll:test22
3. Improve the "(X >> c1) & C2 == 0" folding code to allow a noop cast
between the shift and and. More aggressive bitfolding for other reasons
was turning signed shr's into unsigned shr's, leaving the noop cast in
the way.
llvm-svn: 26131
This allows us to simplify on conditions where bits are not known, but they
are not demanded either! This also fixes a couple of bugs in
ComputeMaskedBits that were exposed during this work.
In the future, swaths of instcombine should be removed, as this code
subsumes a bunch of ad-hockery.
llvm-svn: 26122
1. Teach it new tricks: in particular how to propagate through signed shr and sexts.
2. Teach it to return a bitset of known-1 and known-0 bits, instead of just zero.
3. Teach instcombine (AND X, C) to fold when we know all C bits of X.
This implements Regression/Transforms/InstCombine/bittest.ll, and allows
future things to be simplified.
llvm-svn: 26087
instruction onto the worklist (in case they are now dead).
Add a really trivial local DSE implementation to help out bitfield code.
We now fold this:
struct S {
unsigned char a : 1, b : 1, c : 1, d : 2, e : 3;
S();
};
S::S() : a(0), b(0), c(1), d(0), e(6) {}
to this:
void %_ZN1SC1Ev(%struct.S* %this) {
entry:
%tmp.1 = getelementptr %struct.S* %this, int 0, uint 0
store ubyte 38, ubyte* %tmp.1
ret void
}
much earlier (in gccas instead of only in gccld after DSE runs).
llvm-svn: 26050
mask. This allows the code to be simpler and more efficient.
Also, generalize some of the cases in MVIZ a bit, making it slightly more aggressive.
llvm-svn: 26035
'demanded bits', inspired by Nate's work in the dag combiner. This isn't
complete, but needs to unrelated instcombiner changes to continue.
llvm-svn: 26033
the shifts.
This allows us to fold this (which is the 'integer add a constant' sequence
from cozmic's scheme compmiler):
int %x(uint %anf-temporary776) {
%anf-temporary777 = shr uint %anf-temporary776, ubyte 1
%anf-temporary800 = cast uint %anf-temporary777 to int
%anf-temporary804 = shl int %anf-temporary800, ubyte 1
%anf-temporary805 = add int %anf-temporary804, -2
%anf-temporary806 = or int %anf-temporary805, 1
ret int %anf-temporary806
}
into this:
int %x(uint %anf-temporary776) {
%anf-temporary776 = cast uint %anf-temporary776 to int
%anf-temporary776.mask1 = add int %anf-temporary776, -2
%anf-temporary805 = or int %anf-temporary776.mask1, 1
ret int %anf-temporary805
}
note that instcombine already knew how to eliminate the AND that the two
shifts fold into. This is tested by InstCombine/shift.ll:test26
-Chris
llvm-svn: 25128
Add support for specifying alignment and size of setjmp jmpbufs.
No targets currently do anything with this information, nor is it presrved
in the bytecode representation. That's coming up next.
llvm-svn: 24196
a few times in crafty:
OLD: %tmp.36 = div int %tmp.35, 8 ; <int> [#uses=1]
NEW: %tmp.36 = div uint %tmp.35, 8 ; <uint> [#uses=0]
OLD: %tmp.19 = div int %tmp.18, 8 ; <int> [#uses=1]
NEW: %tmp.19 = div uint %tmp.18, 8 ; <uint> [#uses=0]
OLD: %tmp.117 = div int %tmp.116, 8 ; <int> [#uses=1]
NEW: %tmp.117 = div uint %tmp.116, 8 ; <uint> [#uses=0]
OLD: %tmp.92 = div int %tmp.91, 8 ; <int> [#uses=1]
NEW: %tmp.92 = div uint %tmp.91, 8 ; <uint> [#uses=0]
Which all turn into shrs.
llvm-svn: 24190
8 times in vortex, allowing the srems to be turned into shrs:
OLD: %tmp.104 = rem int %tmp.5.i37, 16 ; <int> [#uses=1]
NEW: %tmp.104 = rem uint %tmp.5.i37, 16 ; <uint> [#uses=0]
OLD: %tmp.98 = rem int %tmp.5.i24, 16 ; <int> [#uses=1]
NEW: %tmp.98 = rem uint %tmp.5.i24, 16 ; <uint> [#uses=0]
OLD: %tmp.91 = rem int %tmp.5.i19, 8 ; <int> [#uses=1]
NEW: %tmp.91 = rem uint %tmp.5.i19, 8 ; <uint> [#uses=0]
OLD: %tmp.88 = rem int %tmp.5.i14, 8 ; <int> [#uses=1]
NEW: %tmp.88 = rem uint %tmp.5.i14, 8 ; <uint> [#uses=0]
OLD: %tmp.85 = rem int %tmp.5.i9, 1024 ; <int> [#uses=2]
NEW: %tmp.85 = rem uint %tmp.5.i9, 1024 ; <uint> [#uses=0]
OLD: %tmp.82 = rem int %tmp.5.i, 512 ; <int> [#uses=2]
NEW: %tmp.82 = rem uint %tmp.5.i1, 512 ; <uint> [#uses=0]
OLD: %tmp.48.i = rem int %tmp.5.i.i161, 4 ; <int> [#uses=1]
NEW: %tmp.48.i = rem uint %tmp.5.i.i161, 4 ; <uint> [#uses=0]
OLD: %tmp.20.i2 = rem int %tmp.5.i.i, 4 ; <int> [#uses=1]
NEW: %tmp.20.i2 = rem uint %tmp.5.i.i, 4 ; <uint> [#uses=0]
it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61)
so the payoff isn't as great.
llvm-svn: 24189
one use (but one is a cast). This handles the very common case of:
X = alloc [n x byte]
Y = cast X to somethingbetter
seteq X, null
In order to avoid infinite looping when there are multiple casts, we only
allow this if the xform is strictly increasing the alignment of the
allocation.
llvm-svn: 23961
where the second has less alignment required. If we had explicit alignment
support in the IR, we could handle this case, but we can't until we do.
llvm-svn: 23960
if () { store A -> P; } else { store B -> P; }
into a PHI node with one store, in the most trival case. This implements
load.ll:test10.
llvm-svn: 23324
load are exactly consequtive. This is picked up by other passes, but this
triggers thousands of times in fortran programs that use static locals
(and is thus a compile-time speedup).
llvm-svn: 23320
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
llvm-svn: 22664
Because the instcombine has to scan the entire function when it starts up
to begin with, we might as well do it in DFO so we can nuke unreachable code.
This fixes: Transforms/InstCombine/2005-07-07-DeadPHILoop.ll
llvm-svn: 22348
the result, turn signed shift rights into unsigned shift rights if possible.
This leads to later simplification and happens *often* in 176.gcc. For example,
this testcase:
struct xxx { unsigned int code : 8; };
enum codes { A, B, C, D, E, F };
int foo(struct xxx *P) {
if ((enum codes)P->code == A)
bar();
}
used to be compiled to:
int %foo(%struct.xxx* %P) {
%tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0 ; <uint*> [#uses=1]
%tmp.2 = load uint* %tmp.1 ; <uint> [#uses=1]
%tmp.3 = cast uint %tmp.2 to int ; <int> [#uses=1]
%tmp.4 = shl int %tmp.3, ubyte 24 ; <int> [#uses=1]
%tmp.5 = shr int %tmp.4, ubyte 24 ; <int> [#uses=1]
%tmp.6 = cast int %tmp.5 to sbyte ; <sbyte> [#uses=1]
%tmp.8 = seteq sbyte %tmp.6, 0 ; <bool> [#uses=1]
br bool %tmp.8, label %then, label %UnifiedReturnBlock
Now it is compiled to:
%tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0 ; <uint*> [#uses=1]
%tmp.2 = load uint* %tmp.1 ; <uint> [#uses=1]
%tmp.2 = cast uint %tmp.2 to sbyte ; <sbyte> [#uses=1]
%tmp.8 = seteq sbyte %tmp.2, 0 ; <bool> [#uses=1]
br bool %tmp.8, label %then, label %UnifiedReturnBlock
which is the difference between this:
foo:
subl $4, %esp
movl 8(%esp), %eax
movl (%eax), %eax
shll $24, %eax
sarl $24, %eax
testb %al, %al
jne .LBBfoo_2
and this:
foo:
subl $4, %esp
movl 8(%esp), %eax
movl (%eax), %eax
testb %al, %al
jne .LBBfoo_2
This occurs 3243 times total in the External tests, 215x in povray,
6x in each f2c'd program, 1451x in 176.gcc, 7x in crafty, 20x in perl,
25x in gap, 3x in m88ksim, 25x in ijpeg.
Maybe this will cause a little jump on gcc tommorow :)
llvm-svn: 21715
This implements set.ll:test20.
This triggers 2x on povray, 9x on mesa, 11x on gcc, 2x on crafty, 1x on eon,
6x on perlbmk and 11x on m88ksim.
It allows us to compile these two functions into the same code:
struct s { unsigned int bit : 1; };
unsigned foo(struct s *p) {
if (p->bit)
return 1;
else
return 0;
}
unsigned bar(struct s *p) { return p->bit; }
llvm-svn: 21690
Completely rework the 'setcc (cast x to larger), y' code. This code has
the advantage of implementing setcc.ll:test19 (being more general than
the previous code) and being correct in all cases.
This allows us to unxfail 2004-11-27-SetCCForCastLargerAndConstant.ll,
and close PR454.
llvm-svn: 21491
* Properly compile this:
struct a {};
int test() {
struct a b[2];
if (&b[0] != &b[1])
abort ();
return 0;
}
to 'return 0', not abort().
llvm-svn: 19875
The second folds operations into selects, e.g. (select C, (X+Y), (Y+Z))
-> (Y+(select C, X, Z)
This occurs a few times across spec, e.g.
select add/sub
mesa: 83 0
povray: 5 2
gcc 4 2
parser 0 22
perlbmk 13 30
twolf 0 3
llvm-svn: 19706
Disable the xform for < > cases. It turns out that the following is being
miscompiled:
bool %test(sbyte %S) {
%T = cast sbyte %S to uint
%V = setgt uint %T, 255
ret bool %V
}
llvm-svn: 19628
* We can now fold cast instructions into select instructions that
have at least one constant operand.
* We now optimize expressions more aggressively based on bits that are
known to be zero. These optimizations occur a lot in code that uses
bitfields even in simple ways.
* We now turn more cast-cast sequences into AND instructions. Before we
would only do this if it if all types were unsigned. Now only the
middle type needs to be unsigned (guaranteeing a zero extend).
* We transform sign extensions into zero extensions in several cases.
This corresponds to these test/Regression/Transforms/InstCombine testcases:
2004-11-22-Missed-and-fold.ll
and.ll: test28-29
cast.ll: test21-24
and-or-and.ll
cast-cast-to-and.ll
zeroext-and-reduce.ll
llvm-svn: 19220
successor block. This turns cases like this:
x = a op b
if (c) {
use x
}
into:
if (c) {
x = a op b
use x
}
This triggers 3965 times in spec, and is tested by
Regression/Transforms/InstCombine/sink_instruction.ll
This appears to expose a bug in the X86 backend for 177.mesa, which I'm
looking in to.
llvm-svn: 18677
If this happens, detect it early instead of relying on instcombine to notice
it later. This can be a big speedup, because PHI nodes can have many
incoming values.
llvm-svn: 17741
%X = alloca ...
%Y = alloca ...
X == Y
into false. This allows us to simplify some stuff in eon (and probably
many other C++ programs) where operator= was checking for self assignment.
Folding this allows us to SROA several additional structs.
llvm-svn: 17735
for (X * C1) + (X * C2) (where * can be mul or shl), allowing us to fold:
Y+Y+Y+Y+Y+Y+Y+Y
into
%tmp.8 = shl long %Y, ubyte 3 ; <long> [#uses=1]
instead of
%tmp.4 = shl long %Y, ubyte 2 ; <long> [#uses=1]
%tmp.12 = shl long %Y, ubyte 2 ; <long> [#uses=1]
%tmp.8 = add long %tmp.4, %tmp.12 ; <long> [#uses=1]
This implements add.ll:test25
Also add support for (X*C1)-(X*C2) -> X*(C1-C2), implementing sub.ll:test18
llvm-svn: 17704
* SubOne/AddOne functions always return ConstantInt, declare them as such
* Pull code for handling setcc X, cst, where cst is at the end of the range,
or cc is LE or GE up earlier in visitSetCondInst. This reduces #iterations
in some cases.
* Fold: (div X, C1) op C2 -> range check, implementing div.ll:test6 - test9.
llvm-svn: 16588
This takes something like this:
%A = phi int [ 3, %cond_false.0 ], [ 2, %endif.0.i ], [ 2, %endif.1.i ]
%B = div int %tmp.243, 4
and turns it into:
%A = phi int [ 3/4, %cond_false.0 ], [ 2/4, %endif.0.i ], [ 2/4, %endif.1.i ]
which is later simplified (in this case) into %A = 0.
This triggers thousands of times in spec, for example, 269 times in 176.gcc.
This is tested by InstCombine/add.ll:test23 and set.ll:test18.
llvm-svn: 16582
Instcombine (setcc (truncate X), C1).
This occurs THOUSANDS of times in many benchmarks. Particularlly common
seem to be things like (seteq (cast bool X to int), int 0)
This turns it into (seteq bool %X, false), which then becomes (not %X).
llvm-svn: 16567
This is important for several reasons:
1. Benchmarks have lots of code that looks like this (perlbmk in particular):
%tmp.2.i = setne int %tmp.0.i, 128 ; <bool> [#uses=1]
%tmp.6343 = seteq int %tmp.0.i, 1 ; <bool> [#uses=1]
%tmp.63 = and bool %tmp.2.i, %tmp.6343 ; <bool> [#uses=1]
we now fold away the setne, a clear improvement.
2. In the more important cases, such as (X >= 10) & (X < 20), we now produce
smaller code: (X-10) < 10.
3. Perhaps the nicest effect of this patch is that it really helps out the
code generators. In particular, for a 'range test' like the above,
instead of generating this on X86 (the difference on PPC is even more
pronounced):
cmp %EAX, 50
setge %CL
cmp %EAX, 100
setl %AL
and %CL, %AL
cmp %CL, 0
we now generate this:
add %EAX, -50
cmp %EAX, 50
Furthermore, this causes setcc's to be folded into branches more often.
These combinations trigger dozens of times in the spec benchmarks, particularly
in 176.gcc, 186.crafty, 253.perlbmk, 254.gap, & 099.go.
llvm-svn: 16559
Implement (setcc (shl X, C1), C2) folding.
The second one occurs several dozen times in spec. The first was added
just in case. :)
These are tested by shift.ll:test2[12], and div.ll:test5
llvm-svn: 16549
This latent bug was exposed by recent changes, and is tested as:
llvm/test/Regression/Transforms/InstCombine/2004-09-28-BadShiftAndSetCC.llx
llvm-svn: 16546
triggers often, for example:
6x in povray, 1x in gzip, 279x in gcc, 1x in crafty, 8x in eon, 11x in perlbmk,
362x in gap, 4x in vortex, 14 in m88ksim, 211x in 126.gcc, 1x in compress,
11x in ijpeg, and 4x in 147.vortex.
llvm-svn: 16521
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
assumed that a constant on the RHS of a multiplication was either an
IntConstant or an FPConstant. It checked for an IntConstant and then,
if it did not find one, did a hard cast to an FPConstant. That code
would crash if the RHS were a ConstantExpr that was neither an
IntConstant nor an FPConstant. This version replaces the hard cast
with a dyn_cast. It performs the same way for IntConstants and
FPConstants but does nothing, instead of crashing, for constant
expressions.
The regression test for this change is 2004-07-27-ConstantExprMul.ll.
llvm-svn: 15291
* Test for whether bits are shifted out during the optzn.
If so, the fold is illegal, though it can be handled explicitly for setne/seteq
This fixes the miscompilation of 254.gap last night, which was a latent bug
exposed by other optimizer improvements.
llvm-svn: 15085
actually care about. Someday when the cast instruction is gone, we can do
better here, but this will do for now. This implements
instcombine/cast.ll:test17/18 as well.
llvm-svn: 15018
"load (cast foo)". This allows us to compile C++ code like this:
class Bclass {
public: virtual int operator()() { return 666; }
};
class Dclass: public Bclass {
public: virtual int operator()() { return 667; }
} ;
int main(int argc, char** argv) {
Dclass x;
return x();
}
Into this:
int %main(int %argc, sbyte** %argv) {
entry:
call void %__main( )
ret int 667
}
Instead of this:
int %main(int %argc, sbyte** %argv) {
entry:
%x = alloca "struct.std::bad_typeid" ; <"struct.std::bad_typeid"*> [#uses=3]
call void %__main( )
%tmp.1.i.i = getelementptr "struct.std::bad_typeid"* %x, uint 0, uint 0, uint 0 ; <int (...)***> [#uses=1]
store int (...)** getelementptr ([3 x int (...)*]* %vtable for Bclass, int 0, long 2), int (...)*** %tmp.1.i.i
%tmp.3.i = getelementptr "struct.std::bad_typeid"* %x, int 0, uint 0, uint 0 ; <int (...)***> [#uses=1]
store int (...)** getelementptr ([3 x int (...)*]* %vtable for Dclass, int 0, long 2), int (...)*** %tmp.3.i
%tmp.5 = load int ("struct.std::bad_typeid"*)** cast (int (...)** getelementptr ([3 x int (...)*]* %vtable for Dclass, int 0, long 2) to int
("struct.std::bad_typeid"*)**) ; <int ("struct.std::bad_typeid"*)*> [#uses=1]
%tmp.6 = call int %tmp.5( "struct.std::bad_typeid"* %x ) ; <int> [#uses=1]
ret int %tmp.6
ret int 0
}
In order words, we now resolve the virtual function call.
llvm-svn: 14783
186.crafty, fhourstones and 132.ijpeg.
Bugpoint makes really nasty miscompilations embarassingly easy to find. It
narrowed it down to the instcombiner and this testcase (from fhourstones):
bool %l7153_l4706_htstat_loopentry_2E_4_no_exit_2E_4(int* %i, [32 x int]* %works, int* %tmp.98.out) {
newFuncRoot:
%tmp.96 = load int* %i ; <int> [#uses=1]
%tmp.97 = getelementptr [32 x int]* %works, long 0, int %tmp.96 ; <int*> [#uses=1]
%tmp.98 = load int* %tmp.97 ; <int> [#uses=2]
%tmp.99 = load int* %i ; <int> [#uses=1]
%tmp.100 = and int %tmp.99, 7 ; <int> [#uses=1]
%tmp.101 = seteq int %tmp.100, 7 ; <bool> [#uses=2]
%tmp.102 = cast bool %tmp.101 to int ; <int> [#uses=0]
br bool %tmp.101, label %codeRepl4.exitStub, label %codeRepl3.exitStub
codeRepl4.exitStub: ; preds = %newFuncRoot
store int %tmp.98, int* %tmp.98.out
ret bool true
codeRepl3.exitStub: ; preds = %newFuncRoot
store int %tmp.98, int* %tmp.98.out
ret bool false
}
... which only has one combination performed on it:
$ llvm-as < t.ll | opt -instcombine -debug | llvm-dis
IC: Old = %tmp.101 = seteq int %tmp.100, 7 ; <bool> [#uses=1]
New = setne int %tmp.100, 0 ; <bool>:<badref> [#uses=0]
IC: MOD = br bool %tmp.101, label %codeRepl3.exitStub, label %codeRepl4.exitStub
IC: MOD = %tmp.97 = getelementptr [32 x int]* %works, uint 0, int %tmp.96 ; <int*> [#uses=1]
It doesn't get much better than this. :)
llvm-svn: 14109
collapse this:
bool %le(int %A, int %B) {
%c1 = setgt int %A, %B
%tmp = select bool %c1, int 1, int 0
%c2 = setlt int %A, %B
%result = select bool %c2, int -1, int %tmp
%c3 = setle int %result, 0
ret bool %c3
}
into:
bool %le(int %A, int %B) {
%c3 = setle int %A, %B ; <bool> [#uses=1]
ret bool %c3
}
which is handy, because the Java FE makes these sequences all over the place.
This is tested as: test/Regression/Transforms/InstCombine/JavaCompare.ll
llvm-svn: 14086
This code hadn't been updated after the "structs with more than 256 elements"
related changes to the GEP instruction. Also it was not handling the
ConstantAggregateZero class.
Now it does!
llvm-svn: 13834
into (X & (C2 << C1)) != (C3 << C1), where the shift may be either left or
right and the compare may be any one.
This triggers 1546 times in 176.gcc alone, as it is a common pattern that
occurs for bitfield accesses.
llvm-svn: 13740
in the size calculation.
This is not something you want to see:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - UNROLLING!
The problem was that 2*2147483648 == 0.
Now we get:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - TOO LARGE: 4294967296>100
Thanks to some anonymous person playing with the demo page that repeatedly
caused zion to go into swapping land. That's one way to ensure you'll get
a quick bugfix. :)
Testcase here: Transforms/LoopUnroll/2004-05-13-DontUnrollTooMuch.ll
llvm-svn: 13564
%tmp.0 = getelementptr [50 x sbyte]* %ar, uint 0, int 5 ; <sbyte*> [#uses=2]
%tmp.7 = getelementptr sbyte* %tmp.0, int 8 ; <sbyte*> [#uses=1]
together. This patch actually allows us to simplify and generalize the code.
llvm-svn: 13415
is only used by a cast, and the casted type is the same size as the original
allocation, it would eliminate the cast by folding it into the allocation.
Unfortunately, it was placing the new allocation instruction right before
the cast, which could pull (for example) alloca instructions into the body
of a function. This turns statically allocatable allocas into expensive
dynamically allocated allocas, which is bad bad bad.
This fixes the problem by placing the new allocation instruction at the same
place the old one was, duh. :)
llvm-svn: 13289
This transforms code like this:
%C = or %A, %B
%D = select %cond, %C, %A
into:
%C = select %cond, %B, 0
%D = or %A, %C
Since B is often a constant, the select can often be eliminated. In any case,
this reduces the usage count of A, allowing subsequent optimizations to happen.
This xform applies when the operator is any of:
add, sub, mul, or, xor, and, shl, shr
llvm-svn: 12800
#1 is to unconditionally strip constantpointerrefs out of
instruction operands where they are absolutely pointless and inhibit
optimization. GRRR!
#2 is to implement InstCombine/getelementptr_const.ll
llvm-svn: 12519
(A <setcc1> B) logicalop (A <setcc2> B) -> (A <setcc3> B) or true or false
Where setcc[123] is one of the 6 setcc instructions, and logicalop is one of: And, Or, Xor
llvm-svn: 7828
of codes. For example,
short kernel (short t1) {
t1 >>= 8; t1 <<= 8;
return t1;
}
became:
short %kernel(short %t1.1) {
%tmp.3 = shr short %t1.1, ubyte 8 ; <short> [#uses=1]
%tmp.5 = cast short %tmp.3 to int ; <int> [#uses=1]
%tmp.7 = shl int %tmp.5, ubyte 8 ; <int> [#uses=1]
%tmp.8 = cast int %tmp.7 to short ; <short> [#uses=1]
ret short %tmp.8
}
before, now it becomes:
short %kernel(short %t1.1) {
%tmp.3 = shr short %t1.1, ubyte 8 ; <short> [#uses=1]
%tmp.8 = shl short %tmp.3, ubyte 8 ; <short> [#uses=1]
ret short %tmp.8
}
which will become:
short %kernel(short %t1.1) {
%tmp.3 = and short %t1.1, 0xFF00
ret short %tmp.3
}
This implements cast-set.ll:test4 and test5
llvm-svn: 7290
IC: (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
We are now guaranteed that all 'or's will be inside of 'and's, and all 'and's
will be inside of 'xor's, if the second operands are constants.
llvm-svn: 7264
This hunk:
- } else if (Src->getNumOperands() == 2 && Src->use_size() == 1) {
+ } else if (Src->getNumOperands() == 2) {
Allows GEP folding to be more aggressive, which reduces the number of instructions
and can dramatically speed up BasicAA in some cases.
llvm-svn: 6286
allows optimization of this:
int %test4(int %A, int %B) {
%a = xor int %A, -1
%c = and int %a, 5 ; 5 = ~c2
%d = xor int %c, -1
ret int %d
}
into this:
int %test4(int %A, int %B) { ; No predecessors!
%c.demorgan = or int %A, -6 ; <int> [#uses=1]
ret int %c.demorgan
}
llvm-svn: 5736
* A & ~A == 0
* A / (2^c) == A >> c if unsigned
* 0 / A == 0
* 1.0 * A == A
* A * (2^c) == A << c
* A ^ ~A == -1
* A | ~A == -1
* 0 % X = 0
* A % (2^c) == A & (c-1) if unsigned
* A - (A & B) == A & ~B
* -1 - A == ~A
llvm-svn: 5587
* Renamed StatisticReporter.h/cpp to Statistic.h/cpp
* Broke constructor to take two const char * arguments instead of one, so
that indendation can be taken care of automatically.
* Sort the list by pass name when printing
* Make sure to print all statistics as a group, instead of randomly when
the statistics dtors are called.
* Updated ProgrammersManual with new semantics.
llvm-svn: 4002
* New ReplaceInstUsesWith function to factor out tons of common code
This needs to be used more in the future still, but it's a good start
* New InsertNewInstBefore to allow multi-instruction replacements
* Change getMaxValue functions to isAllOnesValue function, which doesn't
have to CREATE/lookup a new constant. Also the name is accurate
* Add new isMaxValue, isMinValue, isMaxValueMinusOne, isMinValuePlusOne
functions: This should be moved to Constant* classes eventually
* Implement xor X, ALLONES -> not X
* Fold ALL setcc's of booleans away
* Handle various SetCC's for integers against values at the end of their
ranges, possibly off by one. This implements the setcc-strength-reduce.ll
testcase.
llvm-svn: 3286
- Reenable gep (gep x) -> x
- Make instcombine do dead instruction elimination where it's really
easy. Now visitors don't have to ensure they aren't not processing
dead instructions.
llvm-svn: 3222
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Add support for different "PassType's"
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Remove getPassName implementations from various subclasses
llvm-svn: 3113
"This testcase caused instcombine to fail because it got the same instruction on
it's worklist more than once (which is ok), but then deleted the instruction.
Since the inst stayed on the worklist, as soon as it came back up to be
processed, bad things happened, and opt asserted."
llvm-svn: 2623
- Rename runOnMethod to runOnFunction
* Transform getAnalysisUsageInfo into getAnalysisUsage
- Method is now const
- It now takes one AnalysisUsage object to fill in instead of 3 vectors
to fill in
- Pass's now specify which other passes they _preserve_ not which ones
they modify (be conservative!)
- A pass can specify that it preserves all analyses (because it never
modifies the underlying program)
* s/Method/Function/g in other random places as well
llvm-svn: 2333
We now use an InstVisitor to delegate to different cases that we are
interested in handling. We also fix the FIXME's by adding users to the
worklist when appropriate.
llvm-svn: 2292
llvm/Support/CFG.h
* Make pred & succ iterators for intervals global functions
* Add #includes that are now neccesary because BasicBlock.h doesn't include
InstrTypes.h anymore
llvm-svn: 1750
Method::inst_* is now in llvm/Support/InstIterator.h
GraphTraits specializations for BasicBlock and Methods are now in llvm/Support/CFG.h
llvm-svn: 1746