Commit Graph

7610 Commits

Author SHA1 Message Date
Chris Lattner 9491dee24e Enhance SRoA to be more aggressive about scalarization of aggregate allocas
that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070
2011-01-23 08:27:54 +00:00
Cameron Zwarich 07d6fe34b3 Convert two std::vectors to SmallVectors for a 3.4% speedup running -scalarrepl
on test-suite + SPEC2000 & SPEC2006.

llvm-svn: 124068
2011-01-23 08:03:04 +00:00
Chris Lattner 8acbb79506 have AllocaInfo store the alloca being inspected, simplifying callers.
No functionality change.

llvm-svn: 124067
2011-01-23 07:29:29 +00:00
Chris Lattner 3e56c29068 Rearrange some code a bit. Change MarkUnsafe to
handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066
2011-01-23 07:05:44 +00:00
Chris Lattner a587ab7b94 remove an old hack that avoided creating MMX datatypes. The
X86 backend has been fixed.

llvm-svn: 124064
2011-01-23 06:40:33 +00:00
Dan Gohman 19e30d5a7d Actually check memcpy lengths, instead of just commenting about
how they should be checked.

llvm-svn: 123999
2011-01-21 22:07:57 +00:00
Owen Anderson a834200dbe Just because we have determined that an (fcmp | fcmp) is true for A < B,
A == B, and A > B, does not mean we can fold it to true.  We still need to
check for A ? B (A unordered B).

llvm-svn: 123993
2011-01-21 19:39:42 +00:00
Nick Lewycky ae0275e018 SCCP doesn't actually preserve the CFG. It will delete and insert terminator
instructions.

llvm-svn: 123973
2011-01-21 08:38:09 +00:00
Chris Lattner b5e15d1907 fix PR9013, an infinite loop in instcombine.
llvm-svn: 123968
2011-01-21 05:29:50 +00:00
Chris Lattner f4ca47bda8 update obsolete comment.
llvm-svn: 123965
2011-01-21 05:08:26 +00:00
Nick Lewycky 6a083cf820 Don't try to pull vector bitcasts that change the number of elements through
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.

llvm-svn: 123963
2011-01-21 02:30:43 +00:00
Duncan Sands 8fb2c3827c At -O123 the early-cse pass is run before instcombine has run. According to my
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this).  At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite.  I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs.  One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.

llvm-svn: 123911
2011-01-20 13:21:55 +00:00
Rafael Espindola fc355bc070 Add unnamed_addr when we can show that address of a global is not used.
llvm-svn: 123834
2011-01-19 16:32:21 +00:00
Chris Lattner 86d56c651d fix rdar://8878965, a regression I introduced with the recent
llvm.objectsize changes.

llvm-svn: 123771
2011-01-18 20:53:04 +00:00
Cameron Zwarich fc210c79b7 Convert a std::map to a DenseMap for another 1.7% speedup on -scalarrepl.
llvm-svn: 123732
2011-01-18 04:50:38 +00:00
Cameron Zwarich 6968c41ac8 Make a std::vector a SmallVector<*, 32> like the other vectors in the same
function. This seems to be about a 1.5% speedup of -scalarrepl on test-suite
with SPEC2000 and SPEC2006.

llvm-svn: 123731
2011-01-18 04:41:32 +00:00
Rafael Espindola ecd5b9abe9 Reduce indentation and remove commented out code.
llvm-svn: 123729
2011-01-18 04:36:06 +00:00
Cameron Zwarich b703654edc Remove code for updating dominance frontiers and some outdated references to
dominance and post-dominance frontiers.

llvm-svn: 123725
2011-01-18 04:11:31 +00:00
Cameron Zwarich 4694e69540 Remove outdated references to dominance frontiers.
llvm-svn: 123724
2011-01-18 03:53:26 +00:00
Owen Anderson 459e079912 Remove dead code, that I apparently wrote a while back. We seem to be doing well enough
without whatever this was trying to do.  When/if someone has the time to do some empirical
evaluations, it might be worth it to figure out what this code was trying to do and see if
it's worth resurrecting/fixing.

llvm-svn: 123684
2011-01-17 22:39:54 +00:00
Cameron Zwarich b410858a5f Roll r123609 back in with two changes that fix test failures with expensive
checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662
2011-01-17 17:38:41 +00:00
Cameron Zwarich 67431d7943 Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot.
llvm-svn: 123618
2011-01-17 07:26:51 +00:00
Cameron Zwarich 814cd9233e Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609
2011-01-17 01:08:59 +00:00
Anders Carlsson d3db83349e Teach DAE to look for functions whose arguments are unused, and change all callers to pass in an undefvalue instead.
llvm-svn: 123596
2011-01-16 21:25:33 +00:00
Chris Lattner 7c9f4c9c2b tidy up a comment, as suggested by duncan
llvm-svn: 123590
2011-01-16 17:46:19 +00:00
Rafael Espindola 751677a040 Don't merge two constants if we care about the address of both.
This fixes the original testcase in PR8927. It also causes a clang
binary built with a patched clang to increase in size by 0.21%.

We can probably get some of the size back by writing a pass that
detects that a global never has its pointer compared and adds
unnamed_addr to it (maybe extend global opt). It is also possible that
there are some other cases clang could add unnamed_addr to.

I will investigate extending globalopt next.

llvm-svn: 123584
2011-01-16 17:05:09 +00:00
Chris Lattner e5f8de8639 fix PR8932, a case where arg promotion could infinitely promote.
llvm-svn: 123574
2011-01-16 08:09:24 +00:00
Chris Lattner ed1fb92cfe simplify a little
llvm-svn: 123573
2011-01-16 07:11:21 +00:00
Chris Lattner 6fab2e9418 if an alloca is only ever accessed as a unit, and is accessed with load/store instructions,
then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571
2011-01-16 06:18:28 +00:00
Chris Lattner 7cd8cf7d24 Use an irbuilder to get some trivial constant folding when doing a store
of a constant.

llvm-svn: 123570
2011-01-16 05:58:24 +00:00
Chris Lattner adb1a233b1 remove a dead check, this was needed before we had an explicit veto on uses of phis.
llvm-svn: 123569
2011-01-16 05:37:55 +00:00
Chris Lattner d55581ded8 enhance FoldOpIntoPhi in instcombine to try harder when a phi has
multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568
2011-01-16 05:28:59 +00:00
Chris Lattner ea7131a062 remove the AllowAggressive argument to FoldOpIntoPhi. It is forced to false in the
first line of the function because it isn't a good idea, even for compares.

llvm-svn: 123566
2011-01-16 05:14:26 +00:00
Chris Lattner ff2e737714 more cleanups: use the IR builder.
llvm-svn: 123565
2011-01-16 05:08:00 +00:00
Chris Lattner 25ce280511 tidy up code.
llvm-svn: 123564
2011-01-16 04:37:29 +00:00
Owen Anderson 4e54efd625 Improve the safety of my globalopt enhancement by ensuring that the bitcast
of the stored value to the new store type is always.  Also, add a testcase.

llvm-svn: 123563
2011-01-16 04:33:33 +00:00
Chris Lattner 8b4952fcf7 simplify this code, it is still broken but will follow up on llvm-commits.
llvm-svn: 123558
2011-01-16 02:05:10 +00:00
Chris Lattner 1e209b87ad remove the partial specialization pass. It is unmaintained and has bugs.
llvm-svn: 123554
2011-01-16 00:27:10 +00:00
Nick Lewycky 4a1ff16b29 Add missing whitespace.
llvm-svn: 123543
2011-01-15 18:42:52 +00:00
Nick Lewycky 0296a481f9 Make constmerge a two-pass algorithm so that it won't miss merging
opporuntities. Fixes PR8978.

llvm-svn: 123541
2011-01-15 18:14:21 +00:00
Benjamin Kramer ed5f2e504e Try to unbreak selfhost.
llvm-svn: 123537
2011-01-15 11:25:34 +00:00
Nick Lewycky 540f9536c8 Add a cache that protects mergefunc's internals from more surprises in DenseSet.
Also, replace tabs with spaces. Yes, it's 2011.

llvm-svn: 123535
2011-01-15 10:16:23 +00:00
Chris Lattner af26390790 temporarily revert r123526. While working on a follow-on patch I
realize that ConstantFoldTerminator doesn't preserve dominfo.

llvm-svn: 123527
2011-01-15 07:51:19 +00:00
Chris Lattner 8df83c4a24 fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea.  Just fold branches on constants
into unconditional branches.

llvm-svn: 123526
2011-01-15 07:36:13 +00:00
Chris Lattner ee588defc6 simplify code, no functionality change.
llvm-svn: 123525
2011-01-15 07:29:01 +00:00
Chris Lattner 1b93be501d Now that instruction optzns can update the iterator as they go, we can
have objectsize folding recursively simplify away their result when it
folds.  It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.

llvm-svn: 123524
2011-01-15 07:25:29 +00:00
Chris Lattner 7a2771440f make the current instruction iterator an ivar, allowing xforms that
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.

llvm-svn: 123523
2011-01-15 07:14:54 +00:00
Chris Lattner 9c10d587f6 implement an instcombine xform that canonicalizes casts outside of and-with-constant operations.
This fixes rdar://8808586 which observed that we used to compile:


union xy {
        struct x { _Bool b[15]; } x;
        __attribute__((packed))
        struct y {
                __attribute__((packed)) unsigned long b0to7;
                __attribute__((packed)) unsigned int b8to11;
                __attribute__((packed)) unsigned short b12to13;
                __attribute__((packed)) unsigned char b14;
        } y;
};

struct x
foo(union xy *xy)
{
        return xy->x;
}

into:

_foo:                                   ## @foo
	movq	(%rdi), %rax
	movabsq	$1095216660480, %rcx    ## imm = 0xFF00000000
	andq	%rax, %rcx
	movabsq	$-72057594037927936, %rdx ## imm = 0xFF00000000000000
	andq	%rax, %rdx
	movzbl	%al, %esi
	orq	%rdx, %rsi
	movq	%rax, %rdx
	andq	$65280, %rdx            ## imm = 0xFF00
	orq	%rsi, %rdx
	movq	%rax, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rdx, %rsi
	movl	%eax, %edx
	andl	$-16777216, %edx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rdx
	orq	%rcx, %rdx
	movabsq	$280375465082880, %rcx  ## imm = 0xFF0000000000
	movq	%rax, %rsi
	andq	%rcx, %rsi
	orq	%rdx, %rsi
	movabsq	$71776119061217280, %r8 ## imm = 0xFF000000000000
	andq	%r8, %rax
	orq	%rsi, %rax
	movzwl	12(%rdi), %edx
	movzbl	14(%rdi), %esi
	shlq	$16, %rsi
	orl	%edx, %esi
	movq	%rsi, %r9
	shlq	$32, %r9
	movl	8(%rdi), %edx
	orq	%r9, %rdx
	andq	%rdx, %rcx
	movzbl	%sil, %esi
	shlq	$32, %rsi
	orq	%rcx, %rsi
	movl	%edx, %ecx
	andl	$-16777216, %ecx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rcx
	movq	%rdx, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rcx, %rsi
	movq	%rdx, %rcx
	andq	$65280, %rcx            ## imm = 0xFF00
	orq	%rsi, %rcx
	movzbl	%dl, %esi
	orq	%rcx, %rsi
	andq	%r8, %rdx
	orq	%rsi, %rdx
	ret

We now compile this into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movzwl	12(%rdi), %eax
	movzbl	14(%rdi), %ecx
	shlq	$16, %rcx
	orl	%eax, %ecx
	shlq	$32, %rcx
	movl	8(%rdi), %edx
	orq	%rcx, %rdx
	movq	(%rdi), %rax
	ret

A small improvement :-)

llvm-svn: 123520
2011-01-15 06:32:33 +00:00
Chris Lattner e20dd530d0 one more instcombine variant that is needed to work with future changes,
no functionality change currently.

llvm-svn: 123517
2011-01-15 05:50:18 +00:00
Chris Lattner 497459d5fd fix typo
llvm-svn: 123516
2011-01-15 05:42:47 +00:00