This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
llvm-svn: 147254
performance regressions (both execution-time and compile-time) on our
nightly testers.
Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.
llvm-svn: 147131
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
llvm-svn: 146610
This should always be done as a matter of principal. I don't have a
case that exposes the problem. I just noticed this recently while
scanning the code and realized I meant to fix it long ago.
llvm-svn: 146438
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
detected in the forward-CFG DFS. This prevents the reverse-CFG from
visiting blocks inside loops after blocks that dominate them in the
case where loops have multiple exits.
No testcase, because this fixes a bug which in practice only shows
up in a full optimizer run, due to the use-list order.
This fixes rdar://10422791 and others.
llvm-svn: 146408
indicates whether the intrinsic has a defined result for a first
argument equal to zero. This will eventually allow these intrinsics to
accurately model the semantics of GCC's __builtin_ctz and __builtin_clz
and the X86 instructions (prior to AVX) which implement them.
This patch merely sets the stage by extending the signature of these
intrinsics and establishing auto-upgrade logic so that the old spelling
still works both in IR and in bitcode. The upgrade logic preserves the
existing (inefficient) semantics. This patch should not change any
behavior. CodeGen isn't updated because it can use the existing
semantics regardless of the flag's value.
Note that this will be followed by API updates to Clang and DragonEgg.
Reviewed by Nick Lewycky!
llvm-svn: 146357
Since we're not rewriting IVs in other loops, there's not much reason
to consider their stride when generating formulae.
This should reduce the number of useless formulas considered by LSR.
llvm-svn: 146302
Patch by Brendon Cahoon!
This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.
llvm-svn: 146245
- Walking over pred_begin/pred_end is an expensive operation.
- PHINodes contain a value for each predecessor anyway.
- While it may look like we used to save a few iterations with the set,
be aware that getIncomingValueForBlock does a linear search on
the values of the phi node.
- Another -5% on ARMDisassembler.cpp (Release build). This was the last
entry in the profile that was obviously wasting time.
llvm-svn: 145937
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.
llvm-svn: 145906
- Calling getUser in a loop is much more expensive than iterating over a few instructions.
- Use it instead of the open-coded loop in AddrModeMatcher.
- 5% speedup on ARMDisassembler.cpp Release builds.
llvm-svn: 145810
The callee is usually smaller than the caller, too. This reduces the compile
time of ARMDisassembler.cpp by 32% (Release build). It still takes ages to
compile though.
llvm-svn: 145690
weak variable are compiled by different compilers, such as GCC and LLVM, while
LLVM may increase the alignment to the preferred alignment there is no reason to
think that GCC will use anything more than the ABI alignment. Since it is the
GCC version that might end up in the final program (as the linkage is weak), it
is wrong to increase the alignment of loads from the global up to the preferred
alignment as the alignment might only be the ABI alignment.
Increasing alignment up to the ABI alignment might be OK, but I'm not totally
convinced that it is. It seems better to just leave the alignment of weak
globals alone.
llvm-svn: 145413
gcc, though I thought it was older (my gcc 4.4 has it as a local patch. Whoops!)
This fixes PR10589.
Also add some debugging statements.
Remove GcnoFiles, the mapping from CompilationUnit to raw_ostream. Now that we
start by iterating over each CU and descending into them, there's no need to
maintain a mapping.
llvm-svn: 145208
The right way to check for a binary operation is
cast<BinaryOperator>. The original check: cast<Instruction> &&
numOperands() == 2 would match phi "instructions", leading to an
infinite loop in extreme corner case: a useless phi with operands
[self, constant] that prior optimization passes failed to remove,
being used in the loop by another useless phi, in turn being used by an
lshr or udiv.
Fixes PR11350: runaway iteration assertion.
llvm-svn: 144935
lead to it trying to re-mark a value marked as a constant with a different value. It also appears to trigger very rarely.
Fixes PR11357.
llvm-svn: 144352
Size of data being pointed to wasn't always being checked so some small writes were killing big writes
Fixes <rdar://problem/10426753>
llvm-svn: 144312
Currently checks alignment and killing stores on a power of 2 boundary as this is likely
to trim the size of the earlier store without breaking large vector stores into scalar ones.
Fixes <rdar://problem/10140300>
llvm-svn: 144239
Only currently done if the later store is writing to a power of 2 address or
has the same alignment as the earlier store as then its likely to not break up
large stores into smaller ones
Fixes <rdar://problem/10140300>
llvm-svn: 143630
We've been hitting asserts in this code due to the many supported
combintions of modes (iv-rewrite/no-iv-rewrite) and IV types. This
second rewrite of the code attempts to deal with these cases systematically.
llvm-svn: 143546
instructions.
This doesn't introduce any optimizations we weren't doing before (except
potentially due to pass ordering issues), now passes will eliminate them sooner
as part of their own cleanups.
llvm-svn: 142787
element types, even though the element extraction code does. It is surprising
that this bug has been here for so long. Fixes <rdar://problem/10318778>.
llvm-svn: 142740
combining of the landingpad instruction. The ObjC personality function acts
almost identically to the C++ personality function. In particular, it uses
"null" as a "catch-all" value.
llvm-svn: 142256
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
profile metadata at the same time. Use it to preserve metadata attached
to a branch when re-writing it in InstCombine.
Add metadata to the canonicalize_branch InstCombine test, and check that
it is tranformed correctly.
Reviewed by Nick Lewycky!
llvm-svn: 142168
I rewrote the algorithm a while back so it doesn't require map lookup,
but neglected to change the data structure. This was caught by
llvm-gcc self host, not because there's anything special about
llvm-gcc, but because it is the only test for nondeterminism we
currently have. Unit tests don't work well for everything; we should
always try to have a nondeterminism stress test running.
Fixes PR11133: llvm-gcc self host .o mismatch after enable-iv-rewrite=false
llvm-svn: 142036
Someone more familiar with LSR should double-check that the extra cast is actually doing the right thing in the overflow cases; I'm not completely confident that's that case.
llvm-svn: 141916
would have never worked, since the element type of a vector type is never a
vector type. Also fix the conditional to be more direct in checking whether
EltTy is a vector type.
llvm-svn: 141713
IVs.
Indvars previously chose randomly between congruent IVs. Now it will
bias the decision toward IVs that SCEVExpander likes to create. This
was not done to fix any problem, it's just a welcome side effect of
factoring code.
llvm-svn: 141633
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
llvm-svn: 141599
switch (n) {
case 27:
do_something(x);
...
}
the call do_something(x) will be replaced with do_something(27). In
gcc-as-one-big-file this results in the removal of about 500 lines of
bitcode (about 0.02%), so has about 1/10 of the effect of propagating
branch conditions.
llvm-svn: 141360
While I'm here, fix the related issue with strncmp, add some actual tests for strcmp and strncmp, and start using StringRef::compare for constant folding instead of using strcmp/strncmp so that the optimized IR isn't dependent on the host's implementation of strcmp.
llvm-svn: 141227
Just pull the instruction name, but don't change the order of anything
else. That keeps --debug happy and non-crashing, but doesn't change
how the worklist gets built.
llvm-svn: 141210
When updating the worklist for InstCombine, the Add/AddUsersToWorklist
functions may access the instruction(s) being added, for debug output for
example. If the instructions aren't yet added to the basic block, this
can result in a crash. Finish the instruction transformation before
adjusting the worklist instead.
rdar://10238555
llvm-svn: 141203
branch "br i1 %x, label %if_true, label %if_false" then it replaces
"%x" with "true" in places only reachable via the %if_true arm, and
with "false" in places only reachable via the %if_false arm. Except
that actually it doesn't: if value numbering shows that %y is equal
to %x then, yes, %y will be turned into true/false in this way, but
any occurrences of %x itself are not transformed. Fix this. What's
more, it's often the case that %x is an equality comparison such as
"%x = icmp eq %A, 0", in which case every occurrence of %A that is
only reachable via the %if_true arm can be replaced with 0. Implement
this and a few other variations on this theme. This reduces the number
of lines of LLVM IR in "GCC as one big file" by 0.2%. It has a bigger
impact on Ada code, typically reducing the number of lines of bitcode
by around 0.4% by removing repeated compiler generated checks. Passes
the LLVM nightly testsuite and the Ada ACATS testsuite.
llvm-svn: 141177
it's OK for the false/true destination to have multiple
predecessors as long as the extra ones are dominated by
the branch destination.
llvm-svn: 141176
This handles the case in which LSR rewrites an IV user that is a phi and
splits critical edges originating from a switch.
Fixes <rdar://problem/6453893> LSR is not splitting edges "nicely"
llvm-svn: 141059
We want heuristics to be based on accurate data, but more importantly
we don't want llvm to behave randomly. A benign trunc inserted by an
upstream pass should not cause a wild swings in optimization
level. See PR11034. It's a general problem with threshold-based
heuristics, but we can make it less bad.
llvm-svn: 140919
catch or repeated filter clauses. Teach instcombine a bunch
of tricks for simplifying landingpad clauses. Currently the
code only recognizes the GNU C++ and Ada personality functions,
but that doesn't stop it doing a bunch of "generic" transforms
which are hopefully fine for any real-world personality function.
If these "generic" transforms turn out not to be generic, they
can always be conditioned on the personality function. Probably
someone should add the ObjC++ personality function. I didn't as
I don't know anything about it.
llvm-svn: 140852
Rewriting the entire loop nest now requires -enable-lsr-nested.
See PR11035 for some performance data.
A few unit tests specifically test nested LSR, and are now under a flag.
llvm-svn: 140762
The minor bug heuristic was noticed by inspection. I added the
isLoser/isValid helpers because they will become more
important with subsequent checkins.
llvm-svn: 140580
The landing pad must accompany the invoke when it's extracted. However, if it
does, then the loop isn't properly extracted. I.e., the resulting extraction has
a loop in it. The extracted function is then extracted, etc. resulting in an
infinite loop.
llvm-svn: 140193
extract its associated landing pad block as well. However, that landing pad
block may have more than one predecessor. So split the landing pad block so that
individual landing pads have only one predecessor.
This type of transformation may produce a false positive with bugpoint.
llvm-svn: 140173
extract the landing pad block. Otherwise, there will be a situation where the
invoke's unwind edge lands on a non-landing pad.
We also forbid the user from extracting the landing pad block by itself. Again,
this is not a valid transformation.
llvm-svn: 140083
No tests; these changes aren't really interesting in the sense that the logic is the same for volatile and atomic.
I believe this completes all of the changes necessary for the optimizer to handle loads and stores correctly. I'm going to try and come up with some additional testing, though.
llvm-svn: 139533
better.
Don't immediately give up when an add operation can't be trivially
sign/zero-extended within a loop. If it has NSW/NUW flags, generate a
new expression with sign extended (non-recurrent) operand. As before,
if SCEV says that all sign extends are loop invariant, then we can
widen the operation.
llvm-svn: 139453
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
This changes loop unrolling to use the same mechanism for trip count
computation as indvars. This is a stronger check that tends to unroll
more loops. A very common side-effect is that many single iteration
loops will be removed sooner. The real goal was simply to remove
dependence on canonical IVs.
x86 is break even.
ARM performance changes to expect (+ is good):
External/SPEC/CFP2000/183.equake/183.equake +13%
SingleSource/Benchmarks/Dhrystone/fldry +21%
MultiSource/Applications/spiff/spiff +3%
SingleSource/Benchmarks/Stanford/Puzzle -14%
The Puzzle regression is actually an improvement in loop optimization
that defeats GVN: rdar://problem/10065079.
llvm-svn: 139009
The landingpad instruction is required in the landing pad block. Because we're
not deleting terminating instructions, the invoke may still jump to here (see
Transforms/SCCP/2004-11-16-DeadInvoke.ll). Remove all uses of the landingpad
instruction, but keep it around until code-gen can remove the basic block.
llvm-svn: 138890
In theory this could be extended to other instructions, eg. division by zero, but it's likely that it will "miscompile" some code because people depend on div by zero not trapping. NULL pointer dereference usually leads to a crash so we should be on the safe side.
This shrinks the size of a Release clang by 16k on x86_64.
llvm-svn: 138618
We have to be careful when splitting the landing pad block, because the
landingpad instruction is required to remain as the first non-PHI of an invoke's
unwind edge. To retain this, we split the block into two blocks, moving the
predecessors within the loop to one block and the remaining predecessors to the
other. The landingpad instruction is cloned into the new blocks.
llvm-svn: 138015
SplitLandingPadPredecessors is similar to SplitBlockPredecessors in that it
splits the current block and attaches a set of predecessors to the new basic
block. However, it differs from SplitBlockPredecessors in that it's specifically
designed to handle landing pad blocks.
Two new basic blocks are created: one that is has the vector of predecessors as
its predecessors and one that has the remaining predecessors as its
predecessors. Those two new blocks then receive a cloned copy of the landingpad
instruction from the original block. The landingpad instructions are joined in a
PHI, etc. Like SplitBlockPredecessors, it updates the LLVM IR, AliasAnalysis,
DominatorTree, DominanceFrontier, LoopInfo, and LCCSA analyses.
llvm-svn: 138014
PRE needs the landing pads to have their critical edges split. Doing this for a
landing pad is non-trivial. Abandon the attempt to perform PRE when we come
across a landing pad. (Reviewed by Owen!)
llvm-svn: 137876
One way to exit the loop is through an unwind edge. However, that may involve
splitting the critical edge of the landing pad, which is non-trivial. Prevent
the transformation from rewriting the landing pad exit loop block.
llvm-svn: 137871
making random bad assumptions about instructions which are not explicitly listed.
Includes fix for rdar://9956541, a version of "undef ^ undef should return
0 because it's easier than arguing with users".
llvm-svn: 137777
This commit includes a mention of the landingpad instruction, but it's not
changing the behavior around it. I think the current behavior is correct,
though. Bill, can you double-check that?
llvm-svn: 137691
This builds off of the current scheme, but instead of llvm.eh.exception and
llvm.eh.selector, it uses the landingpad instruction. And instead of
llvm.eh.resume, it uses the resume instruction.
Because of the invariants in the landing pad instruction, a lot of code that's
currently needed to find the appropriate intrinsic calls for an invoke
instruction won't be needed once we go to the new EH scheme. The "FIXME"s tell
us what to remove after we switch.
llvm-svn: 137576
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
the retains and releases all use the same SSA pointer value.
Also, don't let CFG hazards disrupt nested retain+release pair
optimizations.
llvm-svn: 137399
SCEV unrolling can unroll loops with arbitrary induction variables. It
is a prerequisite for -disable-iv-rewrite performance. It is also
easily handles loops of arbitrary structure including multiple exits
and is generally more robust.
This is under a temporary option to avoid affecting default
behavior for the next couple of weeks. It is needed so that I can
checkin unit tests for updateUnloop.
llvm-svn: 137384
based on ScalarEvolution without changing the induction variable phis.
This utility is the main tool of IndVarSimplifyPass, but the pass also
restructures induction variables in strange ways that are sensitive to
pass ordering. This provides a way for other loop passes to simplify
new uses of induction variables created during transformation. The
utility may be used by any pass that preserves ScalarEvolution. Soon
LoopUnroll will use it.
The net effect in this checkin is to cleanup the IndVarSimplify pass
by factoring out the SimplifyIndVar algorithm into a standalone utility.
llvm-svn: 137197
These are not individual bug fixes. I had to rewrite a good chunk of
the unroller to make it sane. I think it was getting lucky on trivial
completely unrolled loops with no early exits. I included some fairly
simple unit tests for partial unrolling. I didn't do much stress
testing, so it may not be perfect, but should be usable now.
llvm-svn: 137190
The 'unwind' instruction was acting essentially as a placeholder, because it
would be replaced at the end of this function by a branch to the "unwind
handler". The 'unwind' instruction is going away, so use 'unreachable' instead,
which serves the same purpose as a placeholder.
llvm-svn: 137098
recurrence, the initial values low bits can sometimes be ignored.
To take advantage of this, added FoldIVUser to IndVarSimplify to fold
an IV operand into a udiv/lshr if the operator doesn't affect the
result.
-indvars -disable-iv-rewrite now transforms
i = phi i4
i1 = i0 + 1
idx = i1 >> (2 or more)
i4 = i + 4
into
i = phi i4
idx = i0 >> ...
i4 = i + 4
llvm-svn: 137013
inlined variable, based on the discussion in PR10542.
This explodes the runtime of several passes down the pipeline due to
a large number of "copies" remaining live across a large function. This
only shows up with both debug and opt, but when it does it creates
a many-minute compile when self-hosting LLVM+Clang. There are several
other cases that show these types of regressions.
All of this is tracked in PR10542, and progress is being made on fixing
the issue. Once its addressed, the re-instated, but until then this
restores the performance for self-hosting and other opt+debug builds.
Devang, let me know if this causes any trouble, or impedes fixing it in
any way, and thanks for working on this!
llvm-svn: 136953
- use SmallVectorImpl& for the function argument.
- ignore the operands on the GEP, even if they aren't constant! Much as we
pretend the malloc succeeds, we pretend that malloc + whatever-you-GEP'd-by
is not null. It's magic!
llvm-svn: 136757
Don't replace a gep/bitcast with 'undef' because that will form a "free(undef)"
which in turn means "unreachable". What we wanted was a no-op. Instead, analyze
the whole tree and look for all the instructions we need to delete first, then
delete them second, not relying on the use_list to stay consistent.
llvm-svn: 136752
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
The new EH is more simple in many respects. Mainly, we don't have to worry about
the "llvm.eh.exception" and "llvm.eh.selector" calls being in weird places.
llvm-svn: 136339
This takes the new 'resume' instruction and turns it into a direct jump to the
caller's landing pad code. The caller's landingpad instruction is merged with
the landingpad instructions of the callee. This is a bit rough and makes some
assumptions in how the code works. But it passes a simple test.
llvm-svn: 136313
size but different element types, so that it filters out the cases
that CreateShuffleVectorCast doesn't handle. This fixes rdar://9786827.
llvm-svn: 135721
For -disable-iv-rewrite, perform LFTR without generating a new
"canonical" induction variable. Instead find the "best" existing
induction variable for use in the loop exit test and compute the final
value of that IV for use in the new loop exit test. In short,
convert to a simple eq/ne exit test as long as it's cheap to do so.
llvm-svn: 135420
is named after a common idiom (i.e., memset/memcpy). Otherwise, we can run into
infinite recursion. Ideally, the user should use the correct -fno-builtin flag,
but in case they don't we should play nicely.
rdar://9763412
llvm-svn: 135286
an assert on Darwin llvm-gcc builds.
Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/buildslave/zorg/buildbot/smooshlab/slave-0.8/build.llvm-gcc-i386-darwin9-RA/llvm.src/lib/VMCore/Instructions.cpp, li\
ne 2067.
etc.
http://smooshlab.apple.com:8013/builders/llvm-gcc-i386-darwin9-RA/builds/2354
--- Reverse-merging r134893 into '.':
U include/llvm/Target/TargetData.h
U include/llvm/DerivedTypes.h
U tools/bugpoint/ExtractFunction.cpp
U unittests/Support/TypeBuilderTest.cpp
U lib/Target/ARM/ARMGlobalMerge.cpp
U lib/Target/TargetData.cpp
U lib/VMCore/Constants.cpp
U lib/VMCore/Type.cpp
U lib/VMCore/Core.cpp
U lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Instrumentation/ProfilingUtils.cpp
U lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/CodeGen/SjLjEHPrepare.cpp
--- Reverse-merging r134888 into '.':
G include/llvm/DerivedTypes.h
U include/llvm/Support/TypeBuilder.h
U include/llvm/Intrinsics.h
U unittests/Analysis/ScalarEvolutionTest.cpp
U unittests/ExecutionEngine/JIT/JITTest.cpp
U unittests/ExecutionEngine/JIT/JITMemoryManagerTest.cpp
U unittests/VMCore/PassManagerTest.cpp
G unittests/Support/TypeBuilderTest.cpp
U lib/Target/MBlaze/MBlazeIntrinsicInfo.cpp
U lib/Target/Blackfin/BlackfinIntrinsicInfo.cpp
U lib/VMCore/IRBuilder.cpp
G lib/VMCore/Type.cpp
U lib/VMCore/Function.cpp
G lib/VMCore/Core.cpp
U lib/VMCore/Module.cpp
U lib/AsmParser/LLParser.cpp
U lib/Transforms/Utils/CloneFunction.cpp
G lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Utils/InlineFunction.cpp
U lib/Transforms/Instrumentation/GCOVProfiling.cpp
U lib/Transforms/Scalar/ObjCARC.cpp
U lib/Transforms/Scalar/SimplifyLibCalls.cpp
U lib/Transforms/Scalar/MemCpyOptimizer.cpp
G lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/Transforms/IPO/ArgumentPromotion.cpp
U lib/Transforms/InstCombine/InstCombineCompares.cpp
U lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
U lib/Transforms/InstCombine/InstCombineCalls.cpp
U lib/CodeGen/DwarfEHPrepare.cpp
U lib/CodeGen/IntrinsicLowering.cpp
U lib/Bitcode/Reader/BitcodeReader.cpp
llvm-svn: 134949
LinearFunctionTestReplace rewrite. No functionality.
I've been wanting to group the indvar subphases into sections and
order them by their logical sequence. My next checkin adds functions
related to LFTR, and doing the reorg now should help reviewers. Since,
most of the code in IndVarSimplify.cpp has recently been replaced or
will be replaced soon, obscuring blame should not be an issue. This
seems like an ideal time to shuffle the code around.
I'm happy to take more suggestions for cleaning up the code. Or if
you've been wanting to cleanup anything in this file yourself, now is
a good time.
llvm-svn: 134941
patch brings numerous advantages to LLVM. One way to look at it
is through diffstat:
109 files changed, 3005 insertions(+), 5906 deletions(-)
Removing almost 3K lines of code is a good thing. Other advantages
include:
1. Value::getType() is a simple load that can be CSE'd, not a mutating
union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
uniques them. This means that the compiler doesn't merge them structurally
which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead
"const Type *" everywhere.
Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.
"LLVM 3.0" is the right time to do this.
There are still some cleanups pending after this, this patch is large enough
as-is.
llvm-svn: 134829
The promotion code lost any alignment information, when hoisting loads and
stores out of the loop. This lead to incorrect aligned memory accesses. We now
use the largest alignment we can prove to be correct.
llvm-svn: 134520
alloca that only holds a copy of a global and we're going to replace the users
of the alloca with that global, just nuke the lifetime intrinsics. Part of
PR10121.
llvm-svn: 133905
"Reinstate r133435 and r133449 (reverted in r133499) now that the clang
self-hosted build failure has been fixed (r133512)."
Due to some additional warnings.
llvm-svn: 133700
ops.
This is a rewrite of the IV simplification algorithm used by
-disable-iv-rewrite. To avoid perturbing the default mode, I
temporarily split the driver and created SimplifyIVUsersNoRewrite. The
idea is to avoid doing opcode/pattern matching inside
IndVarSimplify. SCEV already does it. We want to optimize with the
full generality of SCEV, but optimize def-use chains top down on-demand rather
than rewriting the entire expression bottom-up. This was easy to do
for operations that SCEV can prove are identity function. So we're now
eliminating bitmasks and zero extends this way.
A result of this rewrite is that indvars -disable-iv-rewrite no longer
requires IVUsers.
llvm-svn: 133502
Change PHINodes to store simple pointers to their incoming basic blocks,
instead of full-blown Uses.
Note that this loses an optimization in SplitCriticalEdge(), because we
can no longer walk the use list of a BasicBlock to find phi nodes. See
the comment I removed starting "However, the foreach loop is slow for
blocks with lots of predecessors".
Extend replaceAllUsesWith() on a BasicBlock to also update any phi
nodes in the block's successors. This mimics what would have happened
when PHINodes were proper Users of their incoming blocks. (Note that
this only works if OldBB->replaceAllUsesWith(NewBB) is called when
OldBB still has a terminator instruction, so it still has some
successors.)
llvm-svn: 133435
Change various bits of code to make better use of the existing PHINode
API, to insulate them from forthcoming changes in how PHINodes store
their operands.
llvm-svn: 133434
all over the place in different styles and variants. Standardize on two
preferred entrypoints: one that takes a StructType and ArrayRef, and one that
takes StructType and varargs.
In cases where there isn't a struct type convenient, we now add a
ConstantStruct::getAnon method (whose name will make more sense after a few
more patches land).
It would be "really really nice" if the ConstantStruct::get and
ConstantVector::get methods didn't make temporary std::vectors.
llvm-svn: 133412
In cases such as the attached test, where the case value for a switch
destination is used in a phi node that follows the destination, it
might be better to replace that value with the condition value of the
switch, so that more blocks can be folded away with
TryToSimplifyUncondBranchFromEmptyBlock because there are less
conflicts in the phi node.
llvm-svn: 133344
type's bitwidth matches the (allocated) size of the alloca. This severely
pessimizes vector scalar replacement when the only vector type being used is
something like <3 x float> on x86 or ARM whose allocated size matches a
<4 x float>.
I hope to fix some of the flawed assumptions about allocated size throughout
scalar replacement and reenable this in most cases.
llvm-svn: 133338
spartan right now, but I plan to encode more information in this enum to improve
the correctness and reliability of SRoA. At least this first pass makes it
possible to make VectorTy an actual VectorType.
llvm-svn: 132937
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
llvm-svn: 132926
intrinsics. In fact, we'll optimize a bitcast to that when possible. Detect it
when looking for the lifetime intrinsics.
No test case, noticed by inspection.
llvm-svn: 132906
pad, separating the exception and selector calls from the new lpad. Teaching
it not to do that, or to properly adjust the CFG afterwards, is out of
scope because it would require the other edges to the landing pad to be split
as well (effectively). Instead, just recover from the most likely cases
during inlining. The best long-term solution is to change the exception
representation and commit to either requiring or not requiring the more
complex edge-splitting logic; this is just a shorter-term hack.
llvm-svn: 132799
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.
llvm-svn: 132766
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.
llvm-svn: 132759
then we don't want to set the destination in the indirect branch to the
destination. This is because the indirect branch needs its destinations to have
had their block addresses taken. This isn't so of the new critical edge that's
split during this process. If it turns out that the destination block has only
one predecessor, and that being a BB with an indirect branch, then it won't be
marked as 'used' and may be removed.
PR10072
llvm-svn: 132638
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!
llvm-svn: 132535
variable. Noticed by inspection.
Simulate memset in EvaluateFunction where the target of the memset and the
value we're setting are both the null value. Fixes PR10047!
llvm-svn: 132288
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
- the selector for the landing pad must provide all available information
about the handlers, filters, and cleanups within that landing pad
- calls to _Unwind_Resume must be converted to branches to the enclosing
lpad so as to avoid re-entering the unwinder when the lpad claimed it
was going to handle the exception in some way
This is quite specific to libUnwind-based unwinding. In an effort to not
interfere too badly with other unwinders, and with existing hacks in frontends,
this only triggers on _Unwind_Resume (not _Unwind_Resume_or_Rethrow) and does
nothing with selectors if it cannot find a selector call for either lpad.
llvm-svn: 132200
This looks like it flagged an actual bug. Devang, please review. I added
the parentheses that change behavior, but make the behavior more closely
match commit log's intent.
llvm-svn: 132165
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.
llvm-svn: 132103
case of a switch instruction. Back off this optimization when this would
eliminate all of the predecessors to the latch.
Sorry, I am unable to reduce a reasonably sized test case.
rdar://9486843
llvm-svn: 132022
aligned.
Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval. If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.
This addresses PR9794. We now compile the example into:
define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
%call = call i32 @g(%struct.p* byval align 8 %q) nounwind
ret i32 %call
}
in both x86-64 and x86-32 mode. We still don't get a tailcall though,
because tailcalls apparently can't handle byval.
llvm-svn: 131884
result is non-zero. Implement an example optimization (PR9814), which allows us to
transform:
A / ((1 << B) >>u 2)
into:
A >>u (B-2)
which we compile into:
_divu3: ## @divu3
leal -2(%rsi), %ecx
shrl %cl, %edi
movl %edi, %eax
ret
instead of:
_divu3: ## @divu3
movb %sil, %cl
movl $1, %esi
shll %cl, %esi
shrl $2, %esi
movl %edi, %eax
xorl %edx, %edx
divl %esi, %eax
ret
llvm-svn: 131860
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops. Instead of doign a hack, just
do it the right way.
llvm-svn: 131858
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.
llvm-svn: 131855
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.
llvm-svn: 131744
As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
llvm-svn: 131516