define void @f() {
...
call i32 @g()
...
}
define void @g() {
...
}
The hazards are:
- @f and @g have GC, but they differ GC. Inlining is invalid. This
may never occur.
- @f has no GC, but @g does. g's GC must be propagated to @f.
The other scenarios are safe:
- @f and @g have the same GC.
- @f and @g have no GC.
- @g has no GC.
This patch adds inliner checks for the former two scenarios.
llvm-svn: 45351
calls 'nounwind'. It is important for correct C++
exception handling that nounwind markings do not get
lost, so this transformation is actually needed for
correctness.
llvm-svn: 45218
calls. Remove special casing of inline asm from the
inliner. There is a potential problem: the verifier
rejects invokes of inline asm (not sure why). If an
asm call is not marked "nounwind" in some .ll, and
instcombine is not run, but the inliner is run, then
an illegal module will be created. This is bad but
I'm not sure what the best approach is. I'm tempted
to remove the check in the verifier...
llvm-svn: 45073
Reimplement the xform in Analysis/ConstantFolding.cpp where we can use
targetdata to validate that it is safe. While I'm in there, fix some const
correctness issues and generalize the interface to the "operand folder".
llvm-svn: 44817
methods are new to Function:
bool hasCollector() const;
const std::string &getCollector() const;
void setCollector(const std::string &);
void clearCollector();
The assembly representation is as such:
define void @f() gc "shadow-stack" { ...
The implementation uses an on-the-side table to map Functions to
collector names, such that there is no overhead. A StringPool is
further used to unique collector names, which are extremely
likely to be unique per process.
llvm-svn: 44769
throw exceptions", just mark intrinsics with the nounwind
attribute. Likewise, mark intrinsics as readnone/readonly
and get rid of special aliasing logic (which didn't use
anything more than this anyway).
llvm-svn: 44544
the function type, instead they belong to functions
and function calls. This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll). Hopefully
a bitcode guru (who might that be? :) ) will fix it.
llvm-svn: 44359
The meaning of getTypeSize was not clear - clarifying it is important
now that we have x86 long double and arbitrary precision integers.
The issue with long double is that it requires 80 bits, and this is
not a multiple of its alignment. This gives a primitive type for
which getTypeSize differed from getABITypeSize. For arbitrary precision
integers it is even worse: there is the minimum number of bits needed to
hold the type (eg: 36 for an i36), the maximum number of bits that will
be overwriten when storing the type (40 bits for i36) and the ABI size
(i.e. the storage size rounded up to a multiple of the alignment; 64 bits
for i36).
This patch removes getTypeSize (not really - it is still there but
deprecated to allow for a gradual transition). Instead there is:
(1) getTypeSizeInBits - a number of bits that suffices to hold all
values of the type. For a primitive type, this is the minimum number
of bits. For an i36 this is 36 bits. For x86 long double it is 80.
This corresponds to gcc's TYPE_PRECISION.
(2) getTypeStoreSizeInBits - the maximum number of bits that is
written when storing the type (or read when reading it). For an
i36 this is 40 bits, for an x86 long double it is 80 bits. This
is the size alias analysis is interested in (getTypeStoreSize
returns the number of bytes). There doesn't seem to be anything
corresponding to this in gcc.
(3) getABITypeSizeInBits - this is getTypeStoreSizeInBits rounded
up to a multiple of the alignment. For an i36 this is 64, for an
x86 long double this is 96 or 128 depending on the OS. This is the
spacing between consecutive elements when you form an array out of
this type (getABITypeSize returns the number of bytes). This is
TYPE_SIZE in gcc.
Since successive elements in a SequentialType (arrays, pointers
and vectors) need to be aligned, the spacing between them will be
given by getABITypeSize. This means that the size of an array
is the length times the getABITypeSize. It also means that GEP
computations need to use getABITypeSize when computing offsets.
Furthermore, if an alloca allocates several elements at once then
these too need to be aligned, so the size of the alloca has to be
the number of elements multiplied by getABITypeSize. Logically
speaking this doesn't have to be the case when allocating just
one element, but it is simpler to also use getABITypeSize in this
case. So alloca's and mallocs should use getABITypeSize. Finally,
since gcc's only notion of size is that given by getABITypeSize, if
you want to output assembler etc the same as gcc then getABITypeSize
is the size you want.
Since a store will overwrite no more than getTypeStoreSize bytes,
and a read will read no more than that many bytes, this is the
notion of size appropriate for alias analysis calculations.
In this patch I have corrected all type size uses except some of
those in ScalarReplAggregates, lib/Codegen, lib/Target (the hard
cases). I will get around to auditing these too at some point,
but I could do with some help.
Finally, I made one change which I think wise but others might
consider pointless and suboptimal: in an unpacked struct the
amount of space allocated for a field is now given by the ABI
size rather than getTypeStoreSize. I did this because every
other place that reserves memory for a type (eg: alloca) now
uses getABITypeSize, and I didn't want to make an exception
for unpacked structs, i.e. I did it to make things more uniform.
This only effects structs containing long doubles and arbitrary
precision integers. If someone wants to pack these types more
tightly they can always use a packed struct.
llvm-svn: 43620
In the old way, we computed and inserted phi nodes for the whole IDF of
the definitions of the alloca, then computed which ones were dead and
removed them.
In the new method, we first compute the region where the value is live,
and use that information to only insert phi nodes that are live. This
eliminates the need to compute liveness later, and stops the algorithm
from inserting a bunch of phis which it then later removes.
This speeds up the testcase in PR1432 from 2.00s to 0.15s (14x) in a
release build and 6.84s->0.50s (14x) in a debug build.
llvm-svn: 40825
stored value was a non-instruction value. Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
llvm-svn: 40813
1. Check for revisiting a block before checking domination, which is faster.
2. If the stored value isn't an instruction, we don't have to check for domination.
3. If we have a value used in the same block more than once, make sure to remove the
block from the UsingBlocks vector. Not doing so forces us to go through the slow
path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432. This speeds it up from 2.90s to 2.20s (31%)
llvm-svn: 40811
a using block from the list if we handle it. Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
llvm-svn: 40809
llvm-gcc build to succeed. Without this change it fails in libstdc++
compilation. This causes no regressions in dejagnu tests. However,
someone who knows this code better might want to review it.
llvm-svn: 39924
BBNumbers. Instead of using a bi-directional mapping, just use a single
densemap. This speeds up mem2reg on 176.gcc by 8%, from 1.3489 to
1.2485s.
llvm-svn: 33940
the Transforms library. This reduces debug library size by 132 KB, debug
binary size by 376 KB, and reduces link time for llvm tools slightly.
llvm-svn: 33939
This patch replaces the SymbolTable class with ValueSymbolTable which does
not support types planes. This means that all symbol names in LLVM must now
be unique. The patch addresses the necessary changes to deal with this and
removes code no longer needed as a result. This completes the bulk of the
changes for this PR. Some cleanup patches will follow.
llvm-svn: 33918
Make the Module's dependent library use a std::vector instead of SetVector
adjust #includes in .cpp files because SetVector.h is no longer included.
llvm-svn: 33855
updating. These were exposed by Devang's recent passmgr changes (with
non-default passorderings) because now the inliner can be interleved with
the LCSSA pass.
llvm-svn: 33760
ConstantFoldInstOperands/ConstantFoldCall to take a pointer to an array
of operands + size, instead of an std::vector.
In some cases, switch to using a SmallVector instead of a vector.
This allows us to get rid of some special case gross code that was there
to avoid the cost of constructing a vector.
llvm-svn: 33670
The Module::setEndianness and Module::setPointerSize methods have been
removed. Instead you can get/set the DataLayout. Adjust thise accordingly.
llvm-svn: 33530
This is the final patch for this PR. It implements some minor cleanup
in the use of IntegerType, to wit:
1. Type::getIntegerTypeMask -> IntegerType::getBitMask
2. Type::Int*Ty changed to IntegerType* from Type*
3. ConstantInt::getType() returns IntegerType* now, not Type*
This also fixes PR1120.
Patch by Sheng Zhou.
llvm-svn: 33370
rename Type::getIntegralTypeMask to Type::getIntegerTypeMask.
This makes naming much more consistent. For example, there are now no longer any
instances of IntegerType that are not considered isInteger! :)
llvm-svn: 33225
recommended that getBoolValue be replaced with getZExtValue and that
get(bool) be replaced by get(const Type*, uint64_t). This implements
those changes.
llvm-svn: 33110
Take an incremental step towards type plane elimination. This change
separates types from values in the symbol tables by finally making use
of the TypeSymbolTable class. This yields more natural interfaces for
dealing with types and unclutters the SymbolTable class.
llvm-svn: 32956
This patch replaces signed integer types with signless ones:
1. [US]Byte -> Int8
2. [U]Short -> Int16
3. [U]Int -> Int32
4. [U]Long -> Int64.
5. Removal of isSigned, isUnsigned, getSignedVersion, getUnsignedVersion
and other methods related to signedness. In a few places this warranted
identifying the signedness information from other sources.
llvm-svn: 32785
This patch removes the SetCC instructions and replaces them with the ICmp
and FCmp instructions. The SetCondInst instruction has been removed and
been replaced with ICmpInst and FCmpInst.
llvm-svn: 32751
rework the hacks that had us passing OStream in. We pass in std::ostream*
instead, check for null, and then dispatch to the correct print() method.
llvm-svn: 32636
The long awaited CAST patch. This introduces 12 new instructions into LLVM
to replace the cast instruction. Corresponding changes throughout LLVM are
provided. This passes llvm-test, llvm/test, and SPEC CPUINT2000 with the
exception of 175.vpr which fails only on a slight floating point output
difference.
llvm-svn: 31931
This patch converts the old SHR instruction into two instructions,
AShr (Arithmetic) and LShr (Logical). The Shr instructions now are not
dependent on the sign of their operands.
llvm-svn: 31542
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
llvm-svn: 31380
This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.
llvm-svn: 31063
The critical edge block dominates the dest block if the destblock dominates
all edges other than the one incoming from the critical edge.
llvm-svn: 30696
or when splitting loops with a common header into multiple loops. In particular
the old code would always insert the preheader before the old loop header. This
is disasterous in cases where the loop hasn't been rotated. For example, it can
produce code like:
.. outside the loop...
jmp LBB1_2 #bb13.outer
LBB1_1: #bb1
movsd 8(%esp,%esi,8), %xmm1
mulsd (%edi), %xmm1
addsd %xmm0, %xmm1
addl $24, %edi
incl %esi
jmp LBB1_3 #bb13
LBB1_2: #bb13.outer
leal (%edx,%eax,8), %edi
pxor %xmm1, %xmm1
xorl %esi, %esi
LBB1_3: #bb13
movapd %xmm1, %xmm0
cmpl $4, %esi
jl LBB1_1 #bb1
Note that the loop body is actually LBB1_1 + LBB1_3, which means that the
loop now contains an uncond branch WITHIN it to jump around the inserted
loop header (LBB1_2). Doh.
This patch changes the preheader insertion code to insert it in the right
spot, producing this code:
... outside the loop, fall into the header ...
LBB1_1: #bb13.outer
leal (%edx,%eax,8), %esi
pxor %xmm0, %xmm0
xorl %edi, %edi
jmp LBB1_3 #bb13
LBB1_2: #bb1
movsd 8(%esp,%edi,8), %xmm0
mulsd (%esi), %xmm0
addsd %xmm1, %xmm0
addl $24, %esi
incl %edi
LBB1_3: #bb13
movapd %xmm0, %xmm1
cmpl $4, %edi
jl LBB1_2 #bb1
Totally crazy, no branch in the loop! :)
llvm-svn: 30587
reachable, making it general purpose enough for use by InsertPreheaderForLoop.
Eliminate custom dominfo updating code in InsertPreheaderForLoop, using
UpdateDomInfoForRevectoredPreds instead.
llvm-svn: 30586
Not only will this take huge amounts of compile time, the resultant loop nests
won't be useful for optimization. This reduces loopsimplify time on
Transforms/LoopSimplify/2006-08-11-LoopSimplifyLongTime.ll from ~32s to ~0.4s
with a debug build of llvm on a 2.7Ghz G5.
llvm-svn: 29647
blocks that target loop blocks.
Before, the code was run once per loop, and depended on the number of
predecessors each block in the loop had. Unfortunately, scanning preds can
be really slow when huge numbers of phis exist or when phis with huge numbers
of inputs exist.
Now, the code is run once per function and scans successors instead of preds,
which is far faster. In addition, the new code is simpler and is goto free,
woo.
This change speeds up a nasty testcase Duraid provided me from taking hours to
taking ~72s with a debug build. The functionality this implements is already
tested in the testsuite as Transforms/CodeExtractor/2004-03-13-LoopExtractorCrash.ll.
llvm-svn: 29644
down approach, inspired by discussions with Tanya.
This approach is significantly faster, because it does not need dominator
frontiers and it does not insert extraneous unused PHI nodes. For example, on
252.eon, in a release-asserts build, this speeds up LCSSA (which is the slowest
pass in gccas) from 9.14s to 0.74s on my G5. This code is also slightly smaller
and significantly simpler than the old code.
Amusingly, in a normal Release build (which includes the
"assert(L->isLCSSAForm());" assertion), asserting that the result of LCSSA
is in LCSSA form is actually slower than the LCSSA transformation pass
itself on 252.eon. I will see if Loop::isLCSSAForm can be sped up next.
llvm-svn: 29463
target CG node. This allows the inliner to properly update the callgraph
when using the pruning inliner. The pruning inliner may not copy over all
call sites from a callee to a caller, so the edges corresponding to those
call sites should not be copied over either.
This fixes PR827 and Transforms/Inline/2006-07-12-InlinePruneCGUpdate.ll
llvm-svn: 29120
not handling PHI nodes correctly when determining if a value was live-out.
This patch reduces the number of detected live-out variables in the testcase
from 6565 to 485.
llvm-svn: 28771
If a single exit block has multiple predecessors within the loop, it will
appear in the exit blocks list more than once. LCSSA needs to take that into
account so that it doesn't double process that exit block.
llvm-svn: 28750
code (while cloning) it often gets the branch/switch instructions. Since it
knows that edges of the CFG are dead, it need not clone (or even look) at
the obviously dead blocks. This should speed up the inliner substantially on
code where there are lots of inlinable calls to functions with constant
arguments. On C++ code in particular, this kicks in.
llvm-svn: 28641
the iterated Dominance Frontier of the loop-closure Phi's. This is the
second phase of the LCSSA pass. The third phase (coming soon) will be to
update all uses of loop variables to use the loop-closure Phi's instead.
llvm-svn: 28524
makes it so that it constant folds instructions on the fly. This is good
for several reasons:
0. Many instructions are constant foldable after inlining, particularly if
inlining a call with constant arguments.
1. Without this, the inliner has to allocate memory for all of the instructions
that can be constant folded, then a subsequent pass has to delete them. This
gets the job done without this extra work.
2. This makes the inliner *pass* a bit more aggressive: in particular, it
partially solves a phase order issue where the inliner would inline lots
of code that folds away to nothing, but think that the resultant function
is big because of this code that will be gone. Now the code never exists.
This is the first part of a 2-step process. The second part will be smart
enough to see when this implicit constant folding propagates a constant into
a branch or switch instruction, making CFG edges dead.
This implements Transforms/Inline/inline_constprop.ll
llvm-svn: 28521
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
llvm-svn: 27984
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
llvm-svn: 25366
it doesn't contain any calls. This is a fairly common case for C++ code,
so it will probably speed up the inliner marginally in these cases.
llvm-svn: 25284
function was not an alloca, we wouldn't check the entry block for any allocas,
leading to increased stack space in some cases. In practice, allocas are almost
always at the top of the block, so this was never noticed.
llvm-svn: 25280
has a single def. In this case, look for uses that are dominated by the def
and attempt to rewrite them to directly use the stored value.
This speeds up mem2reg on these values and reduces the number of phi nodes
inserted. This should address PR665.
llvm-svn: 24411
into the LLVMAnalysis library.
This allows LLVMTranform and LLVMTransformUtils to be archives and linked
with LLVMAnalysis.a, which provides any missing definitions.
llvm-svn: 24036
SparcV9 JIT.
2. Make LLVMTransformUtils a relinked object file and always link it before
LLVMAnalysis.a. These two libraries have circular dependencies on each
other which creates problem when building the SparcV9 JIT. This change
fixes the dependency on all platforms problems with a minimum of fuss.
llvm-svn: 24023
not define a value that is used outside of it's block. This catches many
more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc.
This implements branch-phi-thread.ll:test3.ll
llvm-svn: 23397
Instead, just update the BB in-place. This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.
llvm-svn: 22765
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
llvm-svn: 22664
consideration the case where a reference in an unreachable block could
occur. This fixes Transforms/SimplifyCFG/2005-08-01-PHIUpdateFail.ll,
something I ran into while bugpoint'ing another pass.
llvm-svn: 22584
The optimization for locally used allocas was not safe for allocas that
were read before they were written. This change disables that optimization
in that case.
llvm-svn: 22318
sinh, cosh, etc.
* Make the name comparisons for the fp libcalls a little more efficient by
switching on the first character of the name before doing comparisons.
llvm-svn: 21611
This does a simple form of "jump threading", which eliminates CFG edges that
are provably dead. This triggers 90 times in the external tests, and
eliminating CFG edges is always always a good thing! :)
llvm-svn: 20300
SimplifyCFG is one of those passes that we use for final cleanup: it should
not rely on other passes to clean up its garbage. This fixes the "why are
trivially dead setcc's in the output of gccas" problem.
llvm-svn: 19212
if (x) {
code
...
} else {
code
...
}
Turn it into:
code
if (x) {
...
} else {
...
}
This reduces code size and in some common cases allows us to completely
eliminate the conditional. This turns several if/then/else blocks in loops
into straightline code in 179.art, turning the loops into single basic blocks
(good for modsched even!).
Maybe now brg will leave me alone ;-)
llvm-svn: 18366
unneccesary. This allows us to delete several hundred phi nodes of the
form PHI(x,x,x,undef) from 253.perlbmk and probably other programs as well.
This implements Mem2Reg/UndefValuesMerge.ll
llvm-svn: 17098
whose addresses where used by trivial phi nodes and select instructions. This
is now performed by the instcombine pass, which is more powerful, is much
simpler, and is faster. This allows the deletion of a bunch of code, two
FIXME's and two gotos.
llvm-svn: 16406
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
block (common in a switch), make sure to remove extra edges in successor
blocks. This fixes CodeExtractor/2004-08-12-BlockExtractPHI.ll and should
be pulled into LLVM 1.3 (though the regression test need not be, as that
would require pulling in the LoopExtract.cpp changes).
llvm-svn: 15717
since May 1st. In this code, the pred iterator was being invalidated sometimes
causing the wrong entries to be added to PHI nodes.
The fix for this is to defererence and safe the *PI value before we hack on
branch instructions, which changes use/def chains, which SOMETIMES invalidates
the iterator.
llvm-svn: 14278
non-deterministic things like the ordering of blocks in the dominance
frontier of a BB. Unfortunately, I don't know of a better way to solve
this problem than to explicitly sort the BB's in function-order before
processing them. This is guaranteed to slow the pass down a bit, but
is absolutely necessary to get usable diffs between two different tools
executing the mem2reg or scalarrepl pass.
Before this, bazillions of spurious diff failures occurred all over the
place due to the different order of processing PHIs:
- %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.0, uint 0, uint 0
+ %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.1, uint 0, uint 0
Now, the diffs match.
llvm-svn: 14244
nondeterministic results that depend on where these objects land in memory.
Instead, sort by the value of the constant, which is stable.
Before this patch, the -simplifycfg pass run from two different compilers
could cause different code to be generated, though it was semantically the
same:
@@ -12258,8 +12258,8 @@
%s_addr.1 = phi sbyte* [ %s, %entry ], [ %inc.0, %no_exit ] ; <sbyte*> [#uses=5]
%tmp.1 = load sbyte* %s_addr.1 ; <sbyte> [#uses=1]
switch sbyte %tmp.1, label %no_exit [
- sbyte 0, label %loopexit
sbyte 46, label %loopexit
+ sbyte 0, label %loopexit
]
We need to stomp all of this stuff out.
llvm-svn: 14243
is write an autoconf macro that checks whether __isnan or isnan actually works
**using the C++ compiler after #include <cmath>**, instead of doing it the easy
way with AC_CHECK_FUNCS().
llvm-svn: 14171
Add support for acos/asin/atan. 188.ammp contains three calls to acos with
constant arguments. Constant folding it allows elimination of those 3 calls
and three FP divisions of the results.
llvm-svn: 13821
CloneTrace, and because it is primarily an operation on ValueMaps. It
is now a global (non-static) function which can be pulled in using
ValueMapper.h.
llvm-svn: 13600
PHI node entries from multiple outside-the-region blocks. This also fixes
extraction of the entry block in a function. Yaay.
This has successfully block extracted all (but one) block from the score_move
function in obsequi (out of 33). Hrm, I wonder which block the bug is in. :)
llvm-svn: 13489
* Add a stub for the severSplitPHINodes which will allow us to bbextract
bb's with PHI nodes in them soon.
* Remove unused arguments from findInputsOutputs
* Dramatically simplify the code in findInputsOutputs. In particular,
nothing really cares whether or not a PHI node is using something.
* Move moveCodeToFunction to after emitCallAndSwitchStatement as that's the
order they get called.
* Fix a bug where we would code extract a region that included a call to
vastart. Like 'alloca', calls to vastart must stay in the function that
they are defined in.
* Add some comments.
llvm-svn: 13482
from the extracted region. If the return has 0 or 1 exit blocks, the new
function returns void. If it has 2 exits, it returns bool, otherwise it
returns a ushort as before.
This allows us to use a conditional branch instruction when there are two
exit blocks, as often happens during block extraction.
llvm-svn: 13481
1. Get rid of the silly abort block. When doing bb extraction, we get one
abort block for every block extracted, which is kinda annoying.
2. If the switch ends up having a single destination, turn it into an
unconditional branch.
I would like to add support for conditional branches, but to do this we will
want to have the function return a bool instead of a ushort.
llvm-svn: 13478
Basically we were using SimplifyCFG as a huge sledgehammer for a simple
optimization. Because simplifycfg does so many things, we can't use it
for this purpose.
llvm-svn: 12977
1. Names were not put on the new arguments created (ok, this just helps sanity :)
2. Fix outgoing pointer values
3. Do not insert stores for values that had not been computed
4. Fix some wierd problems with the outset calculation
This fixes CodeExtractor/2004-03-14-DominanceProblem.ll, making the extractor
work on at least one simple case!
llvm-svn: 12484
Simplify the input/output finder. All elements of a basic block are
instructions. Any used arguments are also inputs. An instruction can only
be used by another instruction.
llvm-svn: 12405
* Don't insert a branch to the switch instruction after the call, just
make it a single block.
* Insert the new alloca instructions in the entry block of the original
function instead of having them execute dynamically
* Don't make the default edge of the switch instruction go back to the switch.
The loop extractor shouldn't create new loops!
* Give meaningful names to the alloca slots and the reload instructions
* Some minor code simplifications
llvm-svn: 12402
This also implements a two minor improvements:
* Don't insert live-out stores IN the region, insert them on the code path
that exits the region
* If the region is exited to the same block from multiple paths, share the
switch statement entry, live-out store code, and the basic block.
llvm-svn: 12401
a member of the class. While we're at it, turn the collection into a set
instead of a vector to improve efficiency and make queries simpler.
llvm-svn: 12400
This case occurs many times in various benchmarks, especially when combined
with the previous patch. This allows it to get stuff like:
if (X == 4 || X == 3)
if (X == 5 || X == 8)
and
switch (X) {
case 4: case 5: case 6:
if (X == 4 || X == 5)
llvm-svn: 11797
allowed in invoke instructions. Thus, if we are inlining a call to an intrinsic
function into an invoke site, we don't need to turn the call into an invoke!
llvm-svn: 11384
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
llvm-svn: 11307
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
llvm-svn: 11118
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
llvm-svn: 11116
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
llvm-svn: 11108
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
llvm-svn: 11107
Added assert() to ensure symbol table is well formed.
Added code to remember the value that was found; resolving types can change
the symbol table and invalidate the value of the iterator.
Added comments to the ResolveTypes() function (mainly for my own benefit).
Please feel free to correct the comments if they are not accurate.
llvm-svn: 9693
PHI node entries for unwind instructions just like for call instructions which
became invokes! This fixes PR57, tested by
Inline/2003-10-26-InlineInvokeExceptionDestPhi.ll
llvm-svn: 9526
break dominance relationships, and is otherwise bad. This fixes bug:
Inline/2003-10-13-AllocaDominanceProblem.ll. This also fixes miscompilation
of 3 176.gcc source files (reload1.c, global.c, flow.c)
llvm-svn: 9109
Running the inliner on 252.eon used to take 48.4763s, now it takes 14.4148s.
In release mode, it went from taking 25.8741s to taking 11.5712s.
This also fixes a FIXME.
llvm-svn: 8890
"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
llvm-svn: 8884
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
llvm-svn: 8881
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
llvm-svn: 8863
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
llvm-svn: 8861
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
llvm-svn: 8858
* Eliminate the KillList instance variable, instead, just delete loads and
stores as they are "renamed", and delete allocas when they are done
* Make the 'visited' set an instance variable to avoid passing it on the stack.
llvm-svn: 8857
... by making sure to update PHI nodes to take into consideration the
extra edges we get if we inline a call instruction through an invoke.
llvm-svn: 8664
* Make Mem2Reg assign version numbers now for renamed variables instead of
.mem2reg suffixes. This produces what people think of as SSA.
llvm-svn: 5771
* Renamed StatisticReporter.h/cpp to Statistic.h/cpp
* Broke constructor to take two const char * arguments instead of one, so
that indendation can be taken care of automatically.
* Sort the list by pass name when printing
* Make sure to print all statistics as a group, instead of randomly when
the statistics dtors are called.
* Updated ProgrammersManual with new semantics.
llvm-svn: 4002
in the old destination block to indicate that the value flows from the new
edge splitting block, not from the original multi-successor block.
llvm-svn: 3590
* Correctly delete TypeHandles in AsmParser. In addition to not leaking
memory, this prevents a bug that could have occurred when a type got
resolved that the constexpr was using
* Check for errors in the AsmParser instead of hitting assertion failures
deep in the code
* Simplify the interface to the ConstantExpr class, removing unneccesary
parameters to the ::get* methods.
* Rename the 'getelementptr' version of ConstantExpr::get to
ConstantExpr::getGetElementPtr
llvm-svn: 3160
methods
* Eliminate AnalysisID: Now it is just a typedef for const PassInfo*
* Simplify how AnalysisID's are initialized
* Eliminate Analysis/Writer.cpp/.h: incorporate printing functionality into
the analyses themselves.
llvm-svn: 3115
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Add support for different "PassType's"
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Remove getPassName implementations from various subclasses
llvm-svn: 3113
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Add support for different "PassType's"
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Remove getPassName implementations from various subclasses
llvm-svn: 3112
PromoteInstance. Make them local variables that are passed around as
appropriate. Especially in the case of CurrentValue, this makes the
code simpler.
llvm-svn: 2374
- Rename runOnMethod to runOnFunction
* Transform getAnalysisUsageInfo into getAnalysisUsage
- Method is now const
- It now takes one AnalysisUsage object to fill in instead of 3 vectors
to fill in
- Pass's now specify which other passes they _preserve_ not which ones
they modify (be conservative!)
- A pass can specify that it preserves all analyses (because it never
modifies the underlying program)
* s/Method/Function/g in other random places as well
llvm-svn: 2333
Method::inst_* is now in llvm/Support/InstIterator.h
GraphTraits specializations for BasicBlock and Methods are now in llvm/Support/CFG.h
llvm-svn: 1746
* Renamed getOpcode to getOpcodeName
* Changed getOpcodeName to return a const char * instead of string
* Added a getOpcode method to replace getInstType
* Changed code to use getOpcode instead of getInstType
llvm-svn: 152