perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
formulae which become illegal as a result of the offset updating don't
escape.
This is for rdar://8529692. No testcase yet, because the given cases
hit use-list ordering differences.
llvm-svn: 116093
This doesn't usually matter, because the other heuristics usually
succeed regardless, but it's good to keep the register use
bookkeeping consistent.
llvm-svn: 116005
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.
llvm-svn: 115337
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
llvm-svn: 115243
Because of this, we cannot use the Simplify* APIs, as they can assert-fail on unreachable code. Since it's not easy to determine
if a given threading will cause a block to become unreachable, simply defer simplifying simplification to later InstCombine and/or
DCE passes.
llvm-svn: 115082
register pressure and thus excess spills, which we don't currently recover from well. This should
be re-evaluated in the future if our ability to generate good spills/splits improves.
Partial fix for <rdar://problem/7635585>.
llvm-svn: 114919
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.
It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.
Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.
llvm-svn: 114859
Splitting critical edges at the merge point only addressed part of the issue; it is also possible for non-post-domination
to occur when the path from the load to the merge has branches in it. Unfortunately, full anticipation analysis is
time-consuming, so for now approximate it. This is strictly more conservative than real anticipation, so we will miss
some cases that real PRE would allow, but we also no longer insert loads into paths where they didn't exist before. :-)
This is a very slight net positive on SPEC for me (0.5% on average). Most of the benchmarks are largely unaffected, but
when it pays off it pays off decently: 181.mcf improves by 4.5% on my machine.
llvm-svn: 114785
"external" even when doing lazy bitcode loading. This was broken because
a function that is not materialized fails the !isDeclaration() test.
llvm-svn: 114666
truncates are free only in the case where the extended type is legal but the
load type is not. If both types are illegal, such as when they are too big,
the load may not be legalized into an extended load.
llvm-svn: 114568
load when the type of the load is not legal, even if truncates are not free.
The load is going to be legalized to an extending load anyway.
llvm-svn: 114488
walking the asm arguments once and stashing their Values. This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free). PR 8154.
llvm-svn: 114104
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
llvm-svn: 113763
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
llvm-svn: 113679
not unrolling loops that contain calls that would be better off getting inlined. This mostly
comes up when an interleaved devirtualization pass has devirtualized a call which the inliner
will inline on a future pass. Thus, rather than blocking all loops containing calls, add
a metric for "inline candidate calls" and block loops containing those instead.
llvm-svn: 113535
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
The threshold value of 50 is arbitrary, and I chose it simply by analogy to the inlining thresholds, where
the baseline unrolling threshold is slightly smaller than the baseline inlining threshold. This could
undoubtedly use some tuning.
llvm-svn: 113306
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
Switch from isWeakForLinker to mayBeOverridden which is more accurate.
Add more statistics and debugging info. Add comments. Move static function
outside anonymous namespace.
llvm-svn: 113190
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
Loop::hasLoopInvariantOperands method. Remove
a useless and confusing Loop::isLoopInvariant(Instruction)
method, which didn't do what you thought it did.
No functionality change.
llvm-svn: 113133
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
I'm sure it is harmless. Original commit message:
If PrototypeValue is erased in the middle of using the SSAUpdator
then the SSAUpdator may access freed memory. Instead, simply pass
in the type and name explicitly, which is all that was used anyway.
llvm-svn: 112810
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
two are weak, we make them thunks to a new strong function) so don't iterate
through the function list as we're modifying it.
Also add back the outermost loop which got removed during the cleanups.
llvm-svn: 112595
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
instead of PromoteMemToReg. This allows it to stop using DF and DT,
eliminating a computation of DT and DF from clang -O3. Clang is now
down to 2 runs of DomFrontier.
llvm-svn: 112457
assertingvh so we get a violent explosion if the pointer dangles.
2) Fix AliasSetTracker::deleteValue to remove call sites with
by-pointer comparisons instead of by-alias queries. Using
findAliasSetForCallSite can cause alias sets to get merged
when they shouldn't, and can also miss alias sets when the
call is readonly.
#2 fixes PR6889, which only repros with a .c file :(
llvm-svn: 112452
LICM correctly. When sinking an instruction, it should not add
entries for the sunk instruction to the AST, it should remove
the entry for the sunk instruction. The blocks being sunk to
are not in the loop, so their instructions shouldn't be in the
AST (yet)!
llvm-svn: 112447
keeping them around until the pass is destroyed, keep them
around a) just when useful (not for outer loops) and b) destroy
them right after we use them. This should reduce memory use
and fixes potential bugs where a loop is deleted and another
loop gets allocated to the same address.
llvm-svn: 112446
LSRInstance data structures up to date. This fixes some
pessimizations caused by stale data which will be exposed
in an upcoming change.
llvm-svn: 112440
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).
This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.
llvm-svn: 112190
which does the same thing. This eliminates redundant code and
handles MDNodes better. MDNode linking still doesn't fully
work yet though.
llvm-svn: 111941
that it avoids a lot of unnecessary cloning by avoiding remapping
MDNode cycles when none of the nodes in the cycle actually need to
be remapped. Also it uses the new temporary MDNode mechanism.
llvm-svn: 111922
from the LHS should disable reconsidering that pred on the
RHS. However, knowing something about the pred on the RHS
shouldn't disable subsequent additions on the RHS from
happening.
llvm-svn: 111349
uninteresting, just put all the operands on one list and make
GenerateReassociations make the decision about what's interesting.
This is simpler, and it avoids an extra ScalarEvolution::getAddExpr call.
llvm-svn: 111133
- Eliminate redundant successors.
- Convert an indirectbr with one successor into a direct branch.
Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.
llvm-svn: 111060
patterns generated by clang for transpose of a matrix in generic vectors. This is made
of two parts:
1) Propagating vector extracts of hi/lo half into their users
2) Recognizing an insertion of even elements followed by the odd elements as an unpack.
Testcase to come, but this shrinks the # of shuffle instructions generated on x86 from ~40 to the minimal 8.
llvm-svn: 110734
Further clean up the comparison function by removing overly generalized
"domains".
Remove all understanding of ELF aliases and simplify folding code and comments.
llvm-svn: 110434
eliminate several const_casts.
Make CallSite implicitly convertible to ImmutableCallSite.
Rename the getModRefBehavior for intrinsic IDs to
getIntrinsicModRefBehavior to avoid overload ambiguity with CallSite,
which happens to be implicitly convertible to bool.
llvm-svn: 110155
Start cleaning up MergeFunctions to look more like the rest of LLVM. The
primary change here is to move the methods responsible for comparison into the
new FunctionComparator object. Some comments added. There's more to do.
llvm-svn: 110021
exactly what bugpoint expected it to do.
There was also only one user of
BlockExtractorPass(const std::vector<BasicBlock*> &B), so just remove it and
make BlockExtractorPass read BlockFile.
This fixes bugpoint's block extraction.
Nick, please review.
llvm-svn: 109936
alloca instructions (constrained by their internal encoding),
and add error checking for it. Fix an instcombine bug which
generated huge alignment values (null is infinitely aligned).
This fixes undefined behavior noticed by John Regehr.
llvm-svn: 109643
dependence on DominanceFrontier. Instead, add an explicit DominanceFrontier
pass in StandardPasses.h to ensure that it gets scheduled at the right
time.
Declare that loop unrolling preserves ScalarEvolution, and shuffle some
getAnalysisUsages.
This eliminates one LoopSimplify and one LCCSA run in the standard
compile opts sequence.
llvm-svn: 109413
different widths. In a use with a narrower fixup, formulae
may be wider than the fixup, in which case the high bits
aren't necessarily meaningful, so it isn't safe to reuse
them for uses with wider fixups.
This fixes PR7618, though the testcase is too large for a
reasonable regression test, since it heavily dependes on
hitting LSR's heuristics in a certain way.
llvm-svn: 108455
the corresponding or-icmp-and pattern. This has the added benefit of doing
the matching earlier, and thus being less susceptible to being confused by
earlier transforms.
llvm-svn: 108429
it *changing* the things it replaces, not just causing them
to drop to null. There is no functionality change yet, but
this is required for a subsequent patch.
llvm-svn: 108414
"bonus" instruction to be speculatively executed. Add a heuristic to
ensure we're not tripping up out-of-order execution by checking that this bonus
instruction only uses values that were already guaranteed to be available.
This allows us to eliminate the short circuit in (x&1)&&(x&2).
llvm-svn: 108351
by a return that returns a constant, while elsewhere in the function
another return instruction returns a different constant. This is a
special case of accumulator recursion, so just generalize the existing
logic a bit.
llvm-svn: 108241
operation, but the way it's implemented requires the operation to also be
commutative. So add a check for commutativity (and tweak the corresponding
comments). This makes no difference in practice since every associative
LLVM instruction is also commutative! Here's an example to show the need
for commutativity: the accum_recursion.ll testcase calculates the factorial
function. Before the transformation the result of a call is
((((1*1)*2)*3)...)*x
while afterwards it is
(((1*x)*(x-1))...*2)*1
which clearly requires both associativity and commutativity of * to be equal
to the original.
llvm-svn: 108056
(X >s -1) ? C1 : C2 and (X <s 0) ? C2 : C1
into ((X >>s 31) & (C2 - C1)) + C1, avoiding the conditional.
This optimization could be extended to take non-const C1 and C2 but we better
stay conservative to avoid code size bloat for now.
for
int sel(int n) {
return n >= 0 ? 60 : 100;
}
we now generate
sarl $31, %edi
andl $40, %edi
leal 60(%rdi), %eax
instead of
testl %edi, %edi
movl $60, %ecx
movl $100, %eax
cmovnsl %ecx, %eax
llvm-svn: 107866
builds to "Release". The default build is unchanged (optimization on,
assertions on), however it is now called Release+Asserts. The intent
is that future LLVM releases released via llvm.org will be Release builds
in the new sense, i.e. will have assertions disabled (currently they have
assertions enabled, for a more than 20% slowdown). This will bring them
in line with MacOS releases, which ship with assertions disabled. It also
means that "Release" now means the same things in make and cmake builds:
cmake already disables assertions for "Release" builds AFAICS.
llvm-svn: 107758
have any effect, and second, deleting stores can potentially invalidate
an AliasAnalysis, and there's currently no notification for this.
llvm-svn: 107496
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.
For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
Currently only supported on Darwin platforms.
llvm-svn: 107433
such a way that debug info for symbols preserved even if symbols are
optimized away by the optimizer.
Add new special pass to remove debug info for such symbols.
llvm-svn: 107416
metadata types which should be marked as "weak", but which the linker will
remove upon final linkage. For example, the "objc_msgSend_fixup_alloc" symbol is
defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
llvm-svn: 107205
is stripped off. Currently set unconditionally, since the API
does not provide a way of working out if anything was actually
stripped off.
llvm-svn: 107142