* merge two weak functions by making them both alias a third non-weak fn
* don't reimplement CallSite::hasArgument
* whitelist the safe linkage types
llvm-svn: 58568
This triggers only 60 times in llvm-test (look at .llvm.bc, not .linked.rbc)
and so it probably wont be turned on by default. Also, may of those are likely
to go away when PR2973 is fixed.
llvm-svn: 58557
function.
- This explicitly models the costs for functions which should
"always" or "never" be inlined. This fixes bugs where such costs
were not previously respected.
llvm-svn: 58450
LargeBlockInfo, we can now dramatically simplify their implementation
and speed them up at the same time. Now the code has time proportional
to the number of uses of the alloca, not the size of the block.
This also eliminates code that tried to batch up different allocas which
are used in the same blocks, and eliminates the 'retry list' logic which
was baroque and no unneccesary. In addition to being a speedup for crazy
cases, this is also a nice cleanup:
PromoteMemoryToRegister.cpp | 270 +++++++++++++++-----------------------------
1 file changed, 96 insertions(+), 174 deletions(-)
llvm-svn: 58229
a trivial dense map. Use this in RewriteSingleStoreAlloca to
avoid aggressively rescanning blocks over and over again. This
fixes PR2925, speeding up mem2reg on the testcase in that bug
from 4.56s to 0.02s in a debug build on my machine.
llvm-svn: 58227
LoopPass*.
- Although less precise, this means they can be used in clients
without RTTI (who would otherwise need to include LoopPass.h, which
eventually includes things using dynamic_cast). This was the
simplest solution that presented itself, but I am happy to use a
better one if available.
llvm-svn: 58010
to find opportunities for store-to-load forwarding or load CSE,
in the same way that visitStore scans back to do DSE. Also, define
a new helper function for testing whether the addresses of two
memory accesses are known to have the same value, and use it in
both visitStore and visitLoad.
These two changes allow instcombine to eliminate loads in code
produced by front-ends that frequently emit obviously redundant
addressing for memory references.
llvm-svn: 57608
This includes not marking a GEP involving a vector as unsafe, but only when it
has all zero indices. This allows scalarrepl to work in a few more cases.
llvm-svn: 57177
shifting and masking inside a bswap expr. This allows it to handle
the cases from PR2842, which involve the intermediate 'or'
expressions being shifted, not just the input value.
llvm-svn: 57095
when deciding whether to mark a function readnone/readonly.
Since the pass is currently run before SROA, this may be
quite helpful. Requested by Chris on IRC.
llvm-svn: 57050
pointer bitcasts and GEP's", and centralize the
logic in Value::getUnderlyingObject. The
difference with stripPointerCasts is that
stripPointerCasts only strips GEPs if all
indices are zero, while getUnderlyingObject
strips GEPs no matter what the indices are.
llvm-svn: 56922
- return attributes - inreg, zext and sext
- parameter attributes
- function attributes - nounwind, readonly, readnone, noreturn
Return attributes use 0 as the index.
Function attributes use ~0U as the index.
This patch requires corresponding changes in llvm-gcc and clang.
llvm-svn: 56704
s/ParamAttr/Attribute/g
s/PAList/AttrList/g
s/FnAttributeWithIndex/AttributeWithIndex/g
s/FnAttr/Attribute/g
This sets the stage
- to implement function notes as function attributes and
- to distinguish between function attributes and return value attributes.
This requires corresponding changes in llvm-gcc and clang.
llvm-svn: 56622
Unfortunately this means removing one regression test
of GlobalsModRef because I couldn't work out how to
perform it without MarkModRef.
llvm-svn: 56342
can get the readnone/readonly attributes, and gives them it.
The plan is to remove markmodref (which did the same thing
by querying GlobalsModRef) and delete the analogous
functionality from GlobalsModRef.
llvm-svn: 56341
- Recognize expressions like "x > -1 ? x : 0" as min/max and turn them
into expressions like "x < 0 ? 0 : x", which is easily recognizable
as a min/max operation.
- Refrain from folding expression like "y/2 < 1" to "y < 2" when the
comparison is being used as part of a min or max idiom, like
"y/2 < 1 ? 1 : y/2". In that case, the division has another use, so
folding doesn't eliminate it, and obfuscates the min/max, making it
harder to recognize as a min/max operation.
These benefit ScalarEvolution, CodeGen, and anything else that wants to
recognize integer min and max.
llvm-svn: 56246
cases. See the comment above OptimizeSMax for the full story, and
the testcase for an example. This cancels out a pessimization
commonly attributed to indvars, and will allow us to lift some of
the artificial throttles in indvars, rather than add new ones.
llvm-svn: 56230
users, and teach it about shufflevector instructions.
Also, fix a subtle bug in SimplifyDemandedVectorElts'
insertelement code.
This is a patch that was originally written by Eli Friedman,
with some fixes and cleanup by me.
llvm-svn: 55995
call (thus changing the call site) it didn't
inform the callgraph about this. But the
call site does matter - as shown by the testcase,
the callgraph become invalid after the inliner
ran (with an edge between two functions simply
missing), resulting in wrong deductions by
GlobalsModRef.
llvm-svn: 55872
because it does not maintain a correct list
of callsites. I discovered (see following
commit) that the inliner will create a wrong
callgraph if it is fed a callgraph with
correct edges but incorrect callsites. These
were created by Prune-EH, and while it wasn't
done via removeCallEdgeTo, it could have been
done via removeCallEdgeTo, which is an accident
waiting to happen. Use removeCallEdgeFor
instead.
llvm-svn: 55859
attributes on functions, based on the result of
alias analysis. It's not hardwired to use
GlobalsModRef even though this is the only (AFAIK)
alias analysis that results in this pass actually
doing something. Enable as follows:
opt ... -globalsmodref-aa -markmodref ...
Advantages of this pass: (1) records the result
of globalsmodref in the bitcode, meaning it is
available for use by later passes (currently
the pass manager isn't smart enough to magically
make an advanced alias analysis available to all
later passes), which may expose more optimization
opportunities; (2) hopefully speeds up compilation
when code is optimized twice, for example when a
file is compiled to bitcode, then later LTO is done
on it: marking functions readonly/readnone when
producing the initial bitcode should speed up alias
analysis during LTO; (3) good for discovering that
globalsmodref doesn't work very well :)
Not currently turned on by default.
llvm-svn: 55604
use raw_ostream instead of std::ostream. Among other goodness,
this speeds up llvm-dis of kc++ with a release build from 0.85s
to 0.49s (88% faster).
Other interesting changes:
1) This makes Value::print be non-virtual.
2) AP[S]Int and ConstantRange can no longer print to ostream directly,
use raw_ostream instead.
3) This fixes a bug in raw_os_ostream where it didn't flush itself
when destroyed.
4) This adds a new SDNode::print method, instead of only allowing "dump".
A lot of APIs have both std::ostream and raw_ostream versions, it would
be useful to go through and systematically anihilate the std::ostream
versions.
This passes dejagnu, but there may be minor fallout, plz let me know if
so and I'll fix it.
llvm-svn: 55263
In particular, Collector was confusing to implementors. Several
thought that this compile-time class was the place to implement
their runtime GC heap. Of course, it doesn't even exist at runtime.
Specifically, the renames are:
Collector -> GCStrategy
CollectorMetadata -> GCFunctionInfo
CollectorModuleMetadata -> GCModuleInfo
CollectorRegistry -> GCRegistry
Function::getCollector -> getGC (setGC, hasGC, clearGC)
Several accessors and nested types have also been renamed to be
consistent. These changes should be obvious.
llvm-svn: 54899
returning an std::string by value, it fills in a SmallString/SmallVector
passed in. This significantly reduces string thrashing in some cases.
More specifically, this:
- Adds an operator<< and a print method for APInt that allows you to
directly send them to an ostream.
- Reimplements APInt::toString to be much simpler and more efficient
algorithmically in addition to not thrashing strings quite as much.
This speeds up llvm-dis on kc++ by 7%, and may also slightly speed up the
asmprinter. This also fixes a bug I introduced into the asmwriter in a
previous patch w.r.t. alias printing.
llvm-svn: 54873
invalidating the iterator by deleting the current use. This fixes a segfault on
64 bit linux reported in PR2675.
Also remove an unneeded if.
llvm-svn: 54778
do for scalars. Patch contributed by Nicolas Capens
This also generalizes the previous xforms to work on long double, now that
isExactlyValue works for long double.
llvm-svn: 54653
that says "unconditional loads from this argument are safe", we now keep track
of the safety per set of indices from which loads happen. This prevents
ArgPromotion from promoting loads that aren't really valid. As an added effect,
this will now disregard the the type of the indices passed to a GEP, so
"load GEP %A, i32 1" and "load GEP %A, i64 1" will result in a single argument,
not two.
This fixes PR2598, for which a testcase has been added as well.
llvm-svn: 54159
command-line option, and disable it by default. It introduced performance
regressions because CodeGen is currently not able to remat such loads.
llvm-svn: 53997
case for this.
This allows instructions like loads from global variables declared to
be constant to be moved out of loops."
Patch by Stefanus Du Toit!
llvm-svn: 53945
Remove the GetResultInst instruction. It is still accepted in LLVM assembly
and bitcode, where it is now auto-upgraded to ExtractValueInst. Also, remove
support for return instructions with multiple values. These are auto-upgraded
to use InsertValueInst instructions.
The IRBuilder still accepts multiple-value returns, and auto-upgrades them
to InsertValueInst instructions.
llvm-svn: 53941
leads into a cycle involving a different PHI, LSR got stuck running
around that cycle looking for the original PHI. To avoid this, keep
track of visited PHIs and stop searching if we see one more than once.
This fixes PR2570.
llvm-svn: 53879
return value as a whole in deadargelim is really not needed now that we simply
rebuild the old return value and actually prevents some canonicalization from
taking place.
This revert stops deadargelim from changing {i32} into i32 for now, but I'll
fix that next.
llvm-svn: 53609
return values that are still (partially) live. Instead of updating all uses of
a call instruction after removing some elements, it now just rebuilds the
original struct (With undef gaps where the unused values were) and leaves it to
instcombine to clean this up.
The added testcase still fails currently, but this is due to instcombine which
isn't good enough yet. I will fix that part next.
llvm-svn: 53608
only the liveness of partial return values (for functions returning a struct).
This is more explicit to prevent unwanted changes in the return value.
In particular, deadargelim now canonicalizes a function returning {i32} to
returning i32 and {} to void, if the struct returned is not used in its
entirety, but only the single element is used.
llvm-svn: 53606
the min/max values for an integer type, compare against the min/max
values we can prove contain the input. This might be a tighter bound,
so this is general goodness.
llvm-svn: 53446
was using the algorithm for folding unsigned comparisons which is
completely wrong. This has been broken since the signless types change.
llvm-svn: 53444
This cause a regression in InstCombine/JavaCompare, which was doing the right
thing on accident. To handle the missed case, generalize the comparisons based
on masked bits a little bit to handle comparisons against the max value. For
example, we can now xform (slt i32 (and X, 4), 4) -> (setne i32 (and X, 4), 4)
llvm-svn: 53443
Rewrite the DeadArgumentElimination pass, to use a more explicit tracking of
dependencies between return values and/or arguments. Also make the handling of
arguments and return values the same.
The pass now looks properly inside returned structs, but only at the first
level (ie, not inside nested structs).
This version fixed a few more bugs and was cleaned up a bit. It now passes all
of LLVM's testing, and should still pass SPEC2006. There is still a minor bug
with regard to returning nested structs. Since there is currently nothing that
emits such IR, I will fix that in a seperate commit (partly because it requires
a non-trivial fix).
llvm-svn: 53400
into phis. This is actually the same bug as PR2262 /
2008-04-29-VolatileLoadDontMerge.ll, but I missed checking the first
predecessor for multiple successors. Testcase here:
InstCombine/2008-07-08-VolatileLoadMerge.ll
llvm-svn: 53240
1. LSR runOnLoop is always returning false regardless if any transformation is made.
2. AddUsersIfInteresting can create new instructions that are added to DeadInsts. But there is a later early exit which prevents them from being freed.
llvm-svn: 53193
Move GetConstantStringInfo to lib/Analysis. Remove
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
This unbreaks llvm-gcc bootstrap.
llvm-svn: 52884
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
llvm-svn: 52748
in the presence of out-of-loop users of in-loop values and the trip
count is not a known multiple of the unroll count, and to be a bit
simpler overall. This fixes PR2253.
llvm-svn: 52645
structures. Its default threshold is to promote things that are
smaller than 128 bytes, which is sane. However, it is not sane
to do this for things that turn into 128 *registers*. Add a cap
on the number of registers introduced, defaulting to 128/4=32.
llvm-svn: 52611
DeadArgumentElimination and assert that the function type does not change if
nothing was changed. This should catch subtle changes in function type that are
not intended.
llvm-svn: 52536
This is a fixed version that no longer uses multimap::equal_range, which
resulted in a pointer invalidation problem.
Also, DAE::InspectedFunctions was not really necessary, so it got removed.
Lastly, this version no longer applies the extra arg hack on functions who did
not have any arguments to start with.
llvm-svn: 52532
dependencies between return values and/or arguments. Also make the handling of
arguments and return values the same.
The pass now looks properly inside returned structs, but only at the first
level (ie, not inside nested structs).
Also add a testcase for testing various variations of (multiple) dead rerturn
values.
llvm-svn: 52459
speaking these are not constant values. However, when a function always returns
one of its arguments, then from the point of view of each caller the return
value is constant (or at least a known value) and can be replaced.
llvm-svn: 52397
individually.
Also learn IPConstProp how returning first class aggregates work, in addition
to old style multiple return instructions.
Modify the return-constants testscase to confirm this behaviour.
llvm-svn: 52396
when changing the stride of a comparison so that it's slightly
more precise, by having it scan the instruction list to determine
if there is a use of the condition after the point where the
condition will be inserted.
llvm-svn: 52371
I'm at it, rename it to FindInsertedValue.
The only functional change is that newly created instructions are no longer
added to instcombine's worklist, but that is not really necessary anyway (and
I'll commit some improvements next that will completely remove the need).
llvm-svn: 52315
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits. Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.
llvm-svn: 52098
and better control the abstraction. Rename the type
to MVT. To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits(). Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).
llvm-svn: 52044
work and how to replace them into individual values. Also, when trying to
replace an aggregrate that is used by load or store with a single (large)
integer, don't crash (but don't replace the aggregrate either).
Also adds a testcase for both structs and arrays.
llvm-svn: 51997
are the same as in unpacked structs, only field
positions differ. This only matters for structs
containing x86 long double or an apint; it may
cause backwards compatibility problems if someone
has bitcode containing a packed struct with a
field of one of those types.
The issue is that only 10 bytes are needed to
hold an x86 long double: the store size is 10
bytes, but the ABI size is 12 or 16 bytes (linux/
darwin) which comes from rounding the store size
up by the alignment. Because it seemed silly not
to pack an x86 long double into 10 bytes in a
packed struct, this is what was done. I now
think this was a mistake. Reserving the ABI size
for an x86 long double field even in a packed
struct makes things more uniform: the ABI size is
now always used when reserving space for a type.
This means that developers are less likely to
make mistakes. It also makes life easier for the
CBE which otherwise could not represent all LLVM
packed structs (PR2402).
Front-end people might need to adjust the way
they create LLVM structs - see following change
to llvm-gcc.
llvm-svn: 51928
out of instcombine into a new file in libanalysis. This also teaches
ComputeNumSignBits about the number of sign bits in a constantint.
llvm-svn: 51863
the conditions for performing the transform when only the
function declaration is available: no longer allow turning
i32 into i64 for example. Only allow changing between
pointer types, and between pointer types and integers of
the same size. For return values ptr -> intptr was already
allowed; I added ptr -> ptr and intptr -> ptr while there.
As shown by a recent objc testcase, changing the way
parameters/return values are passed can be fatal when calling
code written in assembler that directly manipulates call
arguments and return values unless the transform has no
impact on the way they are passed at the codegen level.
While it is possible to imagine an ABI that treats integers
of pointer size differently to pointers, I don't think LLVM
supports any so the transform should now be safe while still
being useful.
llvm-svn: 51834
the one case that ADCE catches that normal DCE doesn't: non-induction variable
loop computations.
This implementation handles this problem without using postdominators.
llvm-svn: 51668
the section or the visibility from one global
value to another: copyAttributesFrom. This is
particularly useful for duplicating functions:
previously this was done by explicitly copying
each attribute in turn at each place where a
new function was created out of an old one, with
the result that obscure attributes were regularly
forgotten (like the collector or the section).
Hopefully now everything is uniform and nothing
is forgotten.
llvm-svn: 51567
Analysis/ConstantFolding to fold ConstantExpr's, then make instcombine use it
to try to use targetdata to fold constant expressions on void instructions.
Also extend the icmp(inttoptr, inttoptr) folding to handle the case where
int size != ptr size.
llvm-svn: 51559
The SimplifyCFG pass looks at basic blocks that contain only phi nodes,
followed by an unconditional branch. In a lot of cases, such a block (BB) can
be merged into their successor (Succ).
This merging is performed by TryToSimplifyUncondBranchFromEmptyBlock. It does
this by taking all phi nodes in the succesor block Succ and expanding them to
include the predecessors of BB. Furthermore, any phi nodes in BB are moved to
Succ and expanded to include the predecessors of Succ as well.
Before attempting this merge, CanPropagatePredecessorsForPHIs checks to see if
all phi nodes can be properly merged. All functional changes are made to
this function, only comments were updated in
TryToSimplifyUncondBranchFromEmptyBlock.
In the original code, CanPropagatePredecessorsForPHIs looks quite convoluted
and more like stack of checks added to handle different kinds of situations
than a comprehensive check. In particular the first check in the function did
some value checking for the case that BB and Succ have a common predecessor,
while the last check in the function simply rejected all cases where BB and
Succ have a common predecessor. The first check was still useful in the case
that BB did not contain any phi nodes at all, though, so it was not completely
useless.
Now, CanPropagatePredecessorsForPHIs is restructured to to look a lot more
similar to the code that actually performs the merge. Both functions now look
at the same phi nodes in about the same order. Any conflicts (phi nodes with
different values for the same source) that could arise from merging or moving
phi nodes are detected. If no conflicts are found, the merge can happen.
Apart from only restructuring the checks, two main changes in functionality
happened.
Firstly, the old code rejected blocks with common predecessors in most cases.
The new code performs some extra checks so common predecessors can be handled
in a lot of cases. Wherever common predecessors still pose problems, the
blocks are left untouched.
Secondly, the old code rejected the merge when values (phi nodes) from BB were
used in any other place than Succ. However, it does not seem that there is any
situation that would require this check. Even more, this can be proven.
Consider that BB is a block containing of a single phi node "%a" and a branch
to Succ. Now, since the definition of %a will dominate all of its uses, BB
will dominate all blocks that use %a. Furthermore, since the branch from BB to
Succ is unconditional, Succ will also dominate all uses of %a.
Now, assume that one predecessor of Succ is not dominated by BB (and thus not
dominated by Succ). Since at least one use of %a (but in reality all of them)
is reachable from Succ, you could end up at a use of %a without passing
through it's definition in BB (by coming from X through Succ). This is a
contradiction, meaning that our original assumption is wrong. Thus, all
predecessors of Succ must also be dominated by BB (and thus also by Succ).
This means that moving the phi node %a from BB to Succ does not pose any
problems when the two blocks are merged, and any use checks are not needed.
llvm-svn: 51478
ScalarEvolution::deleteValueFromRecords on it before doing the
replaceAllUsesWith, because ScalarEvolution looks at the instruction's
users to find SCEV references to the instruction's SCEV object in its
internal maps.
Move all of LSR's loop-related state clearing after processing the loop
and before cleaning up dead PHI nodes. This eliminates all of LSR's SCEV
references just before the calls to ScalarEvolution::deleteValueFromRecords
so that when ScalarEvolution drops its own SCEV references, the reference
counts will reach zero and the SCEVs will be deleted immediately.
These changes fix some compiler aborts involving ScalarEvolution holding
onto and reusing SCEV objects for instructions that have been deleted.
No regression test unfortunately; because the symptoms were due to
dangling pointers, reduced testcases ended up being fairly arbitrary.
llvm-svn: 51359
replaced is a PHI. This prevents it from inserting uses before defs
in the case that it isn't a PHI and it depends on other instructions
later in the block. This fixes the 447.dealII regression on x86-64.
llvm-svn: 51292
type and the other operand is a constant into integer comparisons.
This happens surprisingly frequently (e.g. 10 times in 471.omnetpp),
which are things like this:
%tmp8283 = sitofp i32 %tmp82 to double
%tmp1013 = fcmp ult double %tmp8283, 0.0
Clearly comparing tmp82 against i32 0 is cheaper here.
this also triggers 8 times in gobmk, including this one:
%tmp375376 = sitofp i32 %tmp375 to double
%tmp377 = fcmp ogt double %tmp375376, 8.150000e+01
which is comparing an integer against 81.5 :).
llvm-svn: 51268
intersecting bits. This triggers all over the place, for example in lencode,
with adds of stuff like:
%tmp580 = mul i32 %tmp579, 2
%tmp582 = and i32 %b8, 1
and
%tmp28 = shl i32 %abs.i, 1
%sign.0 = select i1 %tmp23, i32 1, i32 0
and
%tmp344 = shl i32 %tmp343, 2
%tmp346 = and i32 %tmp96, 3
etc.
llvm-svn: 51263
replaced at linktime with a body that throws, even
if the body in this file does not. Make PruneEH
be more conservative in this case.
g++.dg/eh/weak1.C
llvm-svn: 51207
use-before-def. The problem comes up in code with multiple PHIs where
one PHI is being rewritten in terms of the other, but the other needs
to be casted first. LLVM rules requre the cast instruction to be
inserted after any PHI instructions, but when instructions were
inserted to replace the second PHI value with a function of the first,
they were ended up going before the cast instruction. Avoid this
problem by remembering the location of the cast instruction, when one
is needed, and inserting the expansion of the new value after it.
This fixes a bug that surfaced in 255.vortex on x86-64 when
instcombine was removed from the middle of the loop optimization
passes.
llvm-svn: 51169
is bitcast to return a floating point value. The result of the instruction may
not be used by the program afterwards, and LLVM will happily remove all
instructions except the call. But, on some platforms, if a value is returned as
a floating point, it may need to be removed from the stack (like x87). Thus, we
can't get rid of the bitcast even if there isn't a use of the value.
llvm-svn: 51134
bug as well as a missed optimization. We weren't properly checking for local
dependencies before moving on to non-local ones when doing non-local read-only
call CSE.
llvm-svn: 51082
address of the PassInfo directly instead of calling getPassInfo.
This eliminates a bunch of dynamic initializations of static data.
Also, fold RegisterPassBase into PassInfo, make a bunch of its
data members const, and rearrange some code to initialize data
members in constructors instead of using setter member functions.
llvm-svn: 51022
method. DOUT statements are disabled when assertions are off, but the
side effects of getName() are still evaluated. Just call getNameSTart,
which is close enough and doesn't cause heap traffic.
llvm-svn: 50958
a FunctionPass. This makes it simpler, fixes dozens of bugs, adds
a couple of minor features, and shrinks is considerably: from
2214 to 1437 lines.
llvm-svn: 50520
we were checking for it in the wrong order. This caused a miscompilation because the
return slot optimization assumes that the call it is dealing with is NOT a memcpy.
llvm-svn: 50444
generalizes the previous code to handle the case when the string is not
an immediate to the strlen call (for example, crazy stuff like
strlen(c ? "foo" : "bart"+1) -> 3). This implements
gcc.c-torture/execute/builtins/strlen-2.c. I will generalize other
cases in simplifylibcalls to use the same routine later.
llvm-svn: 50408
ComputeMaskedBits knows about cttz, ctlz, and ctpop. Teach
SelectionDAG's ComputeMaskedBits what InstCombine's knows
about SRem. And teach them both some things about high bits
in Mul, UDiv, URem, and Sub. This allows instcombine and
dagcombine to eliminate sign-extension operations in
several new cases.
llvm-svn: 50358
When choosing between constraints with multiple options,
like "ir", test to see if we can use the 'i' constraint and
go with that if possible. This produces more optimal ASM in
all cases (sparing a register and an instruction to load it),
and fixes inline asm like this:
void test () {
asm volatile (" %c0 %1 " : : "imr" (42), "imr"(14));
}
Previously we would dump "42" into a memory location (which
is ok for the 'm' constraint) which would cause a problem
because the 'c' modifier is not valid on memory operands.
Isn't it great how inline asm turns 'missed optimization'
into 'compile failed'??
Incidentally, this was the todo in
PowerPC/2007-04-24-InlineAsm-I-Modifier.ll
Please do NOT pull this into Tak.
llvm-svn: 50315
to the block that defines their operands. This doesn't work in the
case that the operand is an invoke, because invoke is a terminator
and must be the last instruction in a block.
Replace it with support in SelectionDAGISel for copying struct values
into sequences of virtual registers.
llvm-svn: 50279
getelementptr-seteq.ll into:
define i1 @test(i64 %X, %S* %P) {
%C = icmp eq i64 %X, -1 ; <i1> [#uses=1]
ret i1 %C
}
instead of:
define i1 @test(i64 %X, %S* %P) {
%A.idx.mask = and i64 %X, 4611686018427387903 ; <i64> [#uses=1]
%C = icmp eq i64 %A.idx.mask, 4611686018427387903 ; <i1> [#uses=1]
ret i1 %C
}
And fixes the second half of PR2235. This speeds up the insertion sort
case by 45%, from 1.12s to 0.77s. In practice, this will significantly
speed up for loops structured like:
for (double *P = Base + N; P != Base; --P)
...
Which happens frequently for C++ iterators.
llvm-svn: 50079
as a global helper function. At the same type, switch it from taking
a vector of predecessors to an arbitrary sequential input. This allows
us to switch LoopSimplify to use a SmallVector for various temporary
vectors that it passed into SplitBlockPredecessors.
llvm-svn: 50020
in addition to integer expressions. Rewrite GetOrEnforceKnownAlignment
as a ComputeMaskedBits problem, moving all of its special alignment
knowledge to ComputeMaskedBits as low-zero-bits knowledge.
Also, teach ComputeMaskedBits a few basic things about Mul and PHI
instructions.
This improves ComputeMaskedBits-based simplifications in a few cases,
but more noticeably it significantly improves instcombine's alignment
detection for loads, stores, and memory intrinsics.
llvm-svn: 49492
needs to be fixed here - a previous commit made sure
that intrinsics always get the right attributes.
So remove no-longer needed code, and while there use
Intrinsic::getDeclaration rather than getOrInsertFunction.
llvm-svn: 49337
2. Do not use # of basic blocks as part of the cost computation since it doesn't really figure into function size.
3. More aggressively inline function with vector code.
llvm-svn: 49061
not marked nounwind, or for all functions when -enable-eh
is set, provided the target supports Dwarf EH.
llvm-gcc generates nounwind in the right places; other FEs
will need to do so also. Given such a FE, -enable-eh should
no longer be needed.
llvm-svn: 49006
when something changes, instead of moving forward. This allows us to
simplify memset lowering, inserting the memset at the end of the range of
stuff we're touching instead of at the start.
This, in turn, allows us to make use of the addressing instructions already
used in the function instead of inserting our own. For example, we now
codegen:
%tmp41 = getelementptr [8 x i8]* %ref_idx, i32 0, i32 0 ; <i8*> [#uses=2]
call void @llvm.memset.i64( i8* %tmp41, i8 -1, i64 8, i32 1 )
instead of:
%tmp20 = getelementptr [8 x i8]* %ref_idx, i32 0, i32 7 ; <i8*> [#uses=1]
%ptroffset = getelementptr i8* %tmp20, i64 -7 ; <i8*> [#uses=1]
call void @llvm.memset.i64( i8* %ptroffset, i8 -1, i64 8, i32 1 )
llvm-svn: 48940
memsets that initialize "structs of arrays" and other store sequences
that are not sequential. This is still only enabled if you pass
-form-memset-from-stores. The flag is not heavily tested and I haven't
analyzed the perf regressions when -form-memset-from-stores is passed
either, but this causes no make check regressions.
llvm-svn: 48909
Furthermore, double the limit when more than 10% of the callee instructions are vector instructions. Multimedia kernels tend to love inlining.
llvm-svn: 48725
This fires dozens of times across spec and multisource, but I don't know
if it actually speeds stuff up. Hopefully the testers will show something
nice :)
llvm-svn: 48680
the type instead of the byte size. This was causing troublesome mis-compilations.
True to form, this took 2 days to find and is a one-line fix. :-P
llvm-svn: 48354
1. There is now a "PAListPtr" class, which is a smart pointer around
the underlying uniqued parameter attribute list object, and manages
its refcount. It is now impossible to mess up the refcount.
2. PAListPtr is now the main interface to the underlying object, and
the underlying object is now completely opaque.
3. Implementation details like SmallVector and FoldingSet are now no
longer part of the interface.
4. You can create a PAListPtr with an arbitrary sequence of
ParamAttrsWithIndex's, no need to make a SmallVector of a specific
size (you can just use an array or scalar or vector if you wish).
5. All the client code that had to check for a null pointer before
dereferencing the pointer is simplified to just access the
PAListPtr directly.
6. The interfaces for adding attrs to a list and removing them is a
bit simpler.
Phase #2 will rename some stuff (e.g. PAListPtr) and do other less
invasive changes.
llvm-svn: 48289
safer (when the passed pointer might be invalid). Thanks to Duncan and Chris for the idea behind this,
and extra thanks to Duncan for helping me work out the trap-safety.
llvm-svn: 48280
before trying to merge the block into its predecessors.
This allows two-entry-phi-return.ll to be simplified
into a single basic block.
llvm-svn: 48252