gcc exception handling: if an exception unwinds through
an invoke, then execution must branch to the invoke's
unwind target. We previously tried to enforce this by
appending a cleanup action to every selector, however
this does not always work correctly due to an optimization
in the C++ unwinding runtime: if only cleanups would be
run while unwinding an exception, then the program just
terminates without actually executing the cleanups, as
invoke semantics would require. I was hoping this
wouldn't be a problem, but in fact it turns out to be the
cause of all the remaining failures in the LLVM testsuite
(these also fail with -enable-correct-eh-support, so turning
on -enable-eh didn't make things worse!). Instead we need
to append a full-blown catch-all to the end of each
selector. The correct way of doing this depends on the
personality function, i.e. it is language dependent, so
can only be done by gcc. Thus this patch which generalizes
the eh.selector intrinsic so that it can handle all possible
kinds of action table entries (before it didn't accomodate
cleanups): now 0 indicates a cleanup, and filters have to be
specified using the number of type infos plus one rather than
the number of type infos. Related gcc patches will cause
Ada to pass a cleanup (0) to force the selector to always
fire, while C++ will use a C++ catch-all (null).
llvm-svn: 41484
- *Always* round up the size of the allocation to multiples of stack
alignment to ensure the stack ptr is never left in an invalid state after a dynamic_stackalloc.
llvm-svn: 41132
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
llvm-svn: 40807
This patch fills the last necessary bits to enable exceptions
handling in LLVM. Currently only on x86-32/linux.
In fact, this patch adds necessary intrinsics (and their lowering) which
represent really weird target-specific gcc builtins used inside unwinder.
After corresponding llvm-gcc patch will land (easy) exceptions should be
more or less workable. However, exceptions handling support should not be
thought as 'finished': I expect many small and not so small glitches
everywhere.
llvm-svn: 39855
register ordering, for both physical and virtual registers. Update the PPC
target lowering for calls to expect registers for the call result to
already be in target order.
llvm-svn: 38471
so must be lowered to a value, not nothing at all.
Subtle point: I made eh_selector return 0 and
eh_typeid_for return 1. This means that only
cleanups (destructors) will be run as the exception
unwinds [if eh_typeid_for returned 0 then it would
be as if the first catch always matched, and the
corresponding handler would be run], which is
probably want you want in the CBE.
llvm-svn: 37947
endian swapping should be done, and update the code to use it. This fixes
some register ordering issues on big-endian systems, such as PowerPC,
introduced by the recent illegal by-val arguments changes.
llvm-svn: 37921
refactored getCopyFromParts and getCopyToParts, which are more general.
This effectively adds support for lowering illegal by-val vector call
arguments.
llvm-svn: 37843
illegal value type will be transformed to, for code that needs the
register type after all transformations instead of just after the first
transformation.
Factor out the code that uses this information to do copy-from-regs and
copy-to-regs for various purposes into separate functions so that they
are done consistently.
llvm-svn: 37781
extended vector types. Remove the special SDNode opcodes used for pre-legalize
vector operations, and the special MVT::Vector type used with them. Adjust
lowering and legalize to work with the normal SDNode kinds instead, and to
use the normal MVT functions to work with vector types instead of using the
two special operands that the pre-legalize nodes held.
This allows pre-legalize and post-legalize DAGs, and the code that operates
on them, to be more consistent. Pre-legalize vector operators can be handled
more consistently with scalar operators. And, -view-dag-combine1-dags and
-view-legalize-dags now look prettier for vector code.
llvm-svn: 37719
TargetLowering to SelectionDAG so that they have more convenient
access to the current DAG, in preparation for the ValueType routines
being changed from standalone functions to members of SelectionDAG for
the pre-legalize vector type changes.
llvm-svn: 37704
VCONCAT_VECTORS. Use these for CopyToReg and CopyFromReg legalizing in
the case that the full register is to be split into subvectors instead
of scalars. This replaces uses of VBIT_CONVERT to present values as
vector-of-vector types in order to make whole subvectors accessible via
BUILD_VECTOR and EXTRACT_VECTOR_ELT.
This is in preparation for adding extended ValueType values, where
having vector-of-vector types is undesirable.
llvm-svn: 37569
crashing but breaks exception handling. The problem
described in PR1224 is that invoke is a terminator that
can produce a value. The value may be needed in other
blocks. The code that writes to registers values needed
in other blocks runs before terminators are lowered (in
this case invoke) so asserted because the value was not
yet available. The fix that was applied was to do invoke
lowering earlier, before writing values to registers.
The problem this causes is that the code to copy values
to registers can be output after the invoke call. If
an exception is raised and control is passed to the
landing pad then this copy-code will never execute. If
the value is needed in some code path reached via the
landing pad then that code will get something bogus.
So revert the original fix and simply skip invoke values
in the general copying to registers code. Instead copy
the invoke value to a register in the invoke lowering code.
llvm-svn: 37567
simplifies the code in DwarfWriter, allows for multiple filters and
makes it trivial to specify filters accompanied by cleanups or catch-all
specifications (see next patch). What a deal! Patch blessed by Anton.
llvm-svn: 37398
register, preallocate all input registers and the early clobbered output.
This fixes PR1357 and CodeGen/PowerPC/2007-04-30-InlineAsmEarlyClobber.ll
llvm-svn: 36599
before the copies into physregs are done. This avoids having flag operands
skip the store, causing cycles in the dag at sched time. This fixes infinite
loops on these tests:
test/CodeGen/Generic/2007-04-08-MultipleFrameIndices.ll for PR1308
test/CodeGen/PowerPC/2007-01-29-lbrx-asm.ll
test/CodeGen/PowerPC/2007-01-31-InlineAsmAddrMode.ll
test/CodeGen/X86/2006-07-12-InlineAsmQConstraint.ll for PR828
llvm-svn: 36547
If the operand is not already an indirect operand, spill it to a constant
pool entry or a stack slot.
This fixes PR1356 and CodeGen/X86/2007-04-27-InlineAsm-IntMemInput.ll
llvm-svn: 36536
class supports. In the case of vectors, this means we often get the wrong
type (e.g. we get v4f32 instead of v8i16). Make sure to convert the vector
result to the right type. This fixes CodeGen/X86/2007-04-11-InlineAsmVectorResult.ll
llvm-svn: 35944
1. Fix some bugs in the jump table lowering threshold
2. Implement much better metric for optimal pivot selection
3. Tune thresholds for different lowering methods
4. Implement shift-and trick for lowering small (<machine word
length) cases with few destinations. Good testcase will follow.
llvm-svn: 35816
1) Take address scale into consideration. e.g. i32* -> scale 4.
2) Examine all the users of GEP.
3) Generalize to inter-block GEP's (no longer uses loopinfo).
4) Don't do xform if GEP has other variable index(es).
llvm-svn: 35403
This feature is needed in order to support shifts of more than 255 bits
on large integer types. This changes the syntax for llvm assembly to
make shl, ashr and lshr instructions look like a binary operator:
shl i32 %X, 1
instead of
shl i32 %X, i8 1
Additionally, this should help a few passes perform additional optimizations.
llvm-svn: 33776
1. New parameter attribute called 'inreg'. It has meaning "place this
parameter in registers, if possible". This is some generalization of
gcc's regparm(n) attribute. It's currently used only in X86-32 backend.
2. Completely rewritten CC handling/lowering code inside X86 backend.
Merged stdcall + c CCs and fastcall + fast CC.
3. Dropped CSRET CC. We cannot add struct return variant for each
target-specific CC (e.g. stdcall + csretcc and so on).
4. Instead of CSRET CC introduced 'sret' parameter attribute. Setting in
on first attribute has meaning 'This is hidden pointer to structure
return. Handle it gently'.
5. Fixed small bug in llvm-extract + add new feature to
FunctionExtraction pass, which relinks all internal-linkaged callees
from deleted function to external linkage. This will allow further
linking everything together.
NOTEs: 1. Documentation will be updated soon.
2. llvm-upgrade should be improved to translate csret => sret.
Before this, there will be some unexpected test fails.
llvm-svn: 33597
things: (1) preventing PR1071 and (2) working around missing parameter
attributes for bool type. (2) will be fixed shortly. When PR1071 is fixed,
this patch should be undone.
llvm-svn: 32831
1. Switch expression and cases are compared signed and are sign extended.
2. For function results needing extended, do SIGN_EXTEND if the SExtAttribute
is set and ZERO_EXTEND if the ZExtAttribute is set, otherwise just let
the Legalizer do ANY_EXTEND.
This fixes the recent regression in kimwitu++ and probably the llvm-gcc
bootstrap issue we had today.
llvm-svn: 32830
Three changes:
1. Convert signed integer types to signless versions.
2. Implement the @sext and @zext parameter attributes. Previously the
type of an function parameter was used to determine whether it should
be sign extended or zero extended before the call. This information is
now communicated via the function type's parameter attributes.
3. The interface to LowerCallTo had to be changed in order to accommodate
the parameter attribute information. Although it would have been
convenient to pass in the FunctionType itself, there isn't always one
present in the caller. Consequently, a signedness indication for the
result type and for each parameter was provided for in the interface
to this method. All implementations were changed to make the adjustment
necessary.
llvm-svn: 32788
This patch removes the SetCC instructions and replaces them with the ICmp
and FCmp instructions. The SetCondInst instruction has been removed and
been replaced with ICmpInst and FCmpInst.
llvm-svn: 32751
by producing target constants instead of constants. Constants can get
selected to li/movri instructions, which causes the scheduler to explode.
llvm-svn: 32633
The long awaited CAST patch. This introduces 12 new instructions into LLVM
to replace the cast instruction. Corresponding changes throughout LLVM are
provided. This passes llvm-test, llvm/test, and SPEC CPUINT2000 with the
exception of 175.vpr which fails only on a slight floating point output
difference.
llvm-svn: 31931
First in a series of patches to convert SetCondInst into ICmpInst and
FCmpInst using only two opcodes and having the instructions contain their
predicate value. Nothing uses these classes yet. More patches to follow.
llvm-svn: 31867
This patch converts the old SHR instruction into two instructions,
AShr (Arithmetic) and LShr (Logical). The Shr instructions now are not
dependent on the sign of their operands.
llvm-svn: 31542
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
llvm-svn: 31380
no fixes physreg. Treat this as permission to use any register in the register
class. When this happens and it is safe, allow the llvm register allcoator to
allocate the register instead of doing it at isel time. This eliminates a ton
of copies around common inline asms. For example:
int test2(int Y, int X) {
asm("foo %0, %1" : "=r"(X): "r"(X));
return X;
}
now compiles to:
_test2:
foo r3, r4
blr
instead of:
_test2:
mr r2, r4
foo r2, r2
mr r3, r2
blr
GCC produces:
_test2:
foo r4, r4
mr r3,r4
blr
llvm-svn: 31366
edges whose destinations are not phi nodes don't bother us. Also, share
split edges, since the split edge can't have a phi. This significantly
reduces the complexity of generated code in some cases.
llvm-svn: 31274
being inserted on unsplit critical edges, which introduces (sometimes large
amounts of) partially dead spill code.
This also fixes PR925 + CodeGen/Generic/switch-crit-edge-constant.ll
llvm-svn: 31260
Add many fewer CFG edges and PHI node entries. If there is a switch which has
the same block as multiple destinations, only add that block once as a successor/phi
node (in the jumptable case)
llvm-svn: 31242
Make necessary changes to support DIV -> [SUF]Div. This changes llvm to
have three division instructions: signed, unsigned, floating point. The
bytecode and assembler are bacwards compatible, however.
llvm-svn: 31195
movl 32(%esp), %eax
cmpl $1, %eax
je LBB1_1 #bb
LBB1_4: #entry
cmpl $2, %eax
je LBB1_2 #bb2
jmp LBB1_3 #UnifiedReturnBlock
LBB1_1: #bb
notice that we would miss the fall through and emit this instead:
movl 32(%esp), %eax
cmpl $2, %eax
je LBB1_2 #bb2
LBB1_4: #entry
cmpl $1, %eax
jne LBB1_3 #UnifiedReturnBlock
LBB1_1: #bb
llvm-svn: 31130
This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.
llvm-svn: 31063
blocks into the basic block list when lowering the switch inst. into a
binary tree of if-then statements. This allows the "visitSwitchCase" func
to allow for fall-through behavior.
llvm-svn: 31057
in a specific BB, don't undo this!). This allows us to compile
CodeGen/X86/loop-hoist.ll into:
_foo:
xorl %eax, %eax
*** movl L_Arr$non_lazy_ptr, %ecx
movl 4(%esp), %edx
LBB1_1: #cond_true
movl %eax, (%ecx,%eax,4)
incl %eax
cmpl %edx, %eax
jne LBB1_1 #cond_true
LBB1_2: #return
ret
instead of:
_foo:
xorl %eax, %eax
movl 4(%esp), %ecx
LBB1_1: #cond_true
*** movl L_Arr$non_lazy_ptr, %edx
movl %eax, (%edx,%eax,4)
incl %eax
cmpl %ecx, %eax
jne LBB1_1 #cond_true
LBB1_2: #return
ret
This was noticed in 464.h264ref. This doesn't usually affect PPC,
but strikes X86 all the time.
llvm-svn: 30290
due to switch cases going to the same place, it make #pred != #phi entries,
breaking live interval analysis.
This fixes 458.sjeng on x86 with llc.
llvm-svn: 30236
in the start of an array and a count of operands where applicable. In many
cases, the number of operands is known, so this static array can be allocated
on the stack, avoiding the heap. In many other cases, a SmallVector can be
used, which has the same benefit in the common cases.
I updated a lot of code calling getNode that takes a vector, but ran out of
time. The rest of the code should be updated, and these methods should be
removed.
We should also do the same thing to eliminate the methods that take a
vector of MVT::ValueTypes.
It would be extra nice to convert the dagiselemitter to avoid creating vectors
for operands when calling getTargetNode.
llvm-svn: 29566
2. Added argument to instruction scheduler creators so the creators can do
special things.
3. Repaired target hazard code.
4. Misc.
More to follow.
llvm-svn: 29450
VBIT_VECTOR nodes. There were some confusion about the semantics of
getPackedTypeBreakdown(). e.g. for <4 x f32> it returns 1 and v4f32, not 4,
and f32.
llvm-svn: 28352
This code should be emitted after legalize, so it can't be in sdisel.
Note that the EmitFunctionEntryCode hook should be updated to operate on the
DAG. The X86 backend is the only one currently using this hook.
llvm-svn: 28315
of cross-block live ranges, and allows the bb-at-a-time selector to always
coallesce these away, at isel time.
This reduces the load on the coallescer and register allocator. For example
on a codec on X86, we went from:
1643 asm-printer - Number of machine instrs printed
419 liveintervals - Number of loads/stores folded into instructions
1144 liveintervals - Number of identity moves eliminated after coalescing
1022 liveintervals - Number of interval joins performed
282 liveintervals - Number of intervals after coalescing
1304 liveintervals - Number of original intervals
86 regalloc - Number of times we had to backtrack
1.90232 regalloc - Ratio of intervals processed over total intervals
40 spiller - Number of values reused
182 spiller - Number of loads added
121 spiller - Number of stores added
132 spiller - Number of register spills
6 twoaddressinstruction - Number of instructions commuted to coalesce
360 twoaddressinstruction - Number of two-address instructions
to:
1636 asm-printer - Number of machine instrs printed
403 liveintervals - Number of loads/stores folded into instructions
1155 liveintervals - Number of identity moves eliminated after coalescing
1033 liveintervals - Number of interval joins performed
279 liveintervals - Number of intervals after coalescing
1312 liveintervals - Number of original intervals
76 regalloc - Number of times we had to backtrack
1.88998 regalloc - Ratio of intervals processed over total intervals
1 spiller - Number of copies elided
41 spiller - Number of values reused
191 spiller - Number of loads added
114 spiller - Number of stores added
128 spiller - Number of register spills
4 twoaddressinstruction - Number of instructions commuted to coalesce
356 twoaddressinstruction - Number of two-address instructions
On this testcase, this change provides a modest reduction in spill code,
regalloc iterations, and total instructions emitted. It increases the number
of register coallesces.
llvm-svn: 28115
not be 100% dense. Increase the minimum threshold for the number of cases
in a switch statement from 4 to 6 in order to create a jump table.
llvm-svn: 28079
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.
llvm-svn: 27947
manner that the LowerSwitch LLVM to LLVM pass does: emitting a binary
search tree of basic blocks. The new approach has several advantages:
it is faster, it generates significantly smaller code in many cases, and
it paves the way for implementing dense switch tables as a jump table by
handling switches directly in the instruction selector.
This functionality is currently only enabled on x86, but should be safe for
every target. In anticipation of making it the default, the cfg is now
properly updated in the x86, ppc, and sparc select lowering code.
llvm-svn: 27156
Make the PPC backend not dependent on BRTWOWAY_CC and make the branch
selector smarter about the code it generates, fixing a case in the
readme.
llvm-svn: 26814
function of the top-down scheduler are completely bogus currently, and
having (future) PPC specific in this file is also wrong, but this is a
small incremental step.
llvm-svn: 26552
We do not want to emit "Loop: ... brcond Out; br Loop", as it adds an extra
instruction in the loop. Instead, invert the condition and emit
"Loop: ... br!cond Loop; br Out.
Generalize the fix by moving it from PPCDAGToDAGISel to SelectionDAGLowering.
llvm-svn: 26231
int %rlwnm(int %A, int %B) {
%C = call int asm "rlwnm $0, $1, $2, $3, $4", "=r,r,r,n,n"(int %A, int %B, int 4, int 17)
ret int %C
}
into:
_rlwnm:
or r2, r3, r3
or r3, r4, r4
rlwnm r2, r2, r3, 4, 17 ;; note the immediates :)
or r3, r2, r2
blr
llvm-svn: 25955
%C = call int asm "xyz $0, $1, $2, $3", "=r,r,r,0"(int %A, int %B, int 4)
and get:
xyz r2, r3, r4, r2
note that the r2's are pinned together. Yaay for 2-address instructions.
2342 ----------------------------------------------------------------------
llvm-svn: 25893
preference to determine which scheduler to use. SchedulingForLatency ==
Breadth first; SchedulingForRegPressure == bottom up register reduction list
scheduler.
llvm-svn: 25599
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
llvm-svn: 25366
1. Only forward subst offsets into loads and stores, not into arbitrary
things, where it will likely become a load.
2. If the source is a cast from pointer, forward subst the cast as well,
allowing us to fold the cast away (improving cases when the cast is
from an alloca or global).
This hasn't been fully tested, but does appear to further reduce register
pressure and improve code. Lets let the testers grind on it a bit. :)
llvm-svn: 24640
changes allow us to generate the following code:
_foo:
li r2, 0
lvx v0, r2, r3
vaddfp v0, v0, v0
stvx v0, r2, r3
blr
for this llvm:
void %foo(<4 x float>* %a) {
entry:
%tmp1 = load <4 x float>* %a
%tmp2 = add <4 x float> %tmp1, %tmp1
store <4 x float> %tmp2, <4 x float>* %a
ret void
}
llvm-svn: 24534
file to become corrupted due to interactions between mmap'd memory segments
and file descriptors closing. The problem is completely avoiding by using
a third temporary file.
Patch provided by Evan Jones
llvm-svn: 24527
generates it. Make MVT::Vector expand-only, and remove the code in
Legalize that attempts to legalize it.
The plan for supporting N x Type is to continually epxand it in ExpandOp
until it gets down to 2 x Type, where it will be scalarized into a pair of
scalars.
llvm-svn: 24482
packed types with an element count of 1, although more generic support is
coming. This allows LLVM to turn the following code:
void %foo(<1 x float> * %a) {
entry:
%tmp1 = load <1 x float> * %a;
%tmp2 = add <1 x float> %tmp1, %tmp1
store <1 x float> %tmp2, <1 x float> *%a
ret void
}
Into:
_foo:
lfs f0, 0(r3)
fadds f0, f0, f0
stfs f0, 0(r3)
blr
llvm-svn: 24416
alignment information appropriately. Includes code for PowerPC to support
fixed-size allocas with alignment larger than the stack. Support for
arbitrarily aligned dynamic allocas coming soon.
llvm-svn: 24224
a special case hack for X86, make the hack more general: if an incoming argument
register is not used in any block other than the entry block, don't copy it to
a vreg. This helps us compile code like this:
%struct.foo = type { int, int, [0 x ubyte] }
int %test(%struct.foo* %X) {
%tmp1 = getelementptr %struct.foo* %X, int 0, uint 2, int 100
%tmp = load ubyte* %tmp1 ; <ubyte> [#uses=1]
%tmp2 = cast ubyte %tmp to int ; <int> [#uses=1]
ret int %tmp2
}
to:
_test:
lbz r3, 108(r3)
blr
instead of:
_test:
lbz r2, 108(r3)
or r3, r2, r2
blr
The (dead) copy emitted to copy r3 into a vreg for extra-block uses was
increasing the live range of r3 past the load, preventing the coallescing.
This implements CodeGen/PowerPC/reg-coallesce-simple.ll
llvm-svn: 24115
allows us to lower legal return types to something else, to meet ABI
requirements (such as that i64 be returned in two i32 regs on Darwin/ppc).
llvm-svn: 23802
Though I have done extensive testing, it is possible that this will break
things in configs I can't test. Please let me know if this causes a problem
and I'll fix it ASAP.
llvm-svn: 23504
instead of ZERO_EXTEND to eliminate extraneous extensions. This eliminates
dead zero extensions on formal arguments and other cases on PPC, implementing
the newly tightened up test/Regression/CodeGen/PowerPC/small-arguments.ll test.
llvm-svn: 23205
Nate noticed in yacr2 (and I know occurs in other places as well).
This is still rough, as the critical edge blocks are not intelligently placed
but is added to get some idea to see if this improves performance.
llvm-svn: 22825
used to tack a register number onto the node.
Instead of doing this, make a new node, RegisterSDNode, which is a leaf
containing a register number. These three operations just become normal
DAG nodes now, instead of requiring special handling.
Note that with this change, it is no longer correct to make illegal
CopyFromReg/CopyToReg nodes. The legalizer will not touch them, and this
is bad, so don't do it. :)
llvm-svn: 22806
CC out of the SetCC operation, making SETCC a standard ternary operation and
CC's a standard DAG leaf. This will make it possible for other node to use
CC's as operands in the future...
llvm-svn: 22728
1. Pass Value*'s into lowering methods so that the proper pointers can be
added to load/stores from the valist
2. Intrinsics that return void should only return a token chain, not a token
chain/retval pair.
3. Rename LowerVAArgNext -> LowerVAArg, because VANext is long gone.
llvm-svn: 22338
population (ctpop). Generic lowering is implemented, however only promotion
is implemented for SelectionDAG at the moment.
More coming soon.
llvm-svn: 21676
(TRUNK)Stores and (EXT|ZEXT|SEXT)Loads have an extra SDOperand which is a SrcValueSDNode which contains the Value*. Note that if the operation is introduced by the backend, it will still have the operand, but the value* will be null.
llvm-svn: 21599
returned integer values all of the way to 64-bits (we only did it to 32-bits
leaving the top bits undefined). This causes problems for targets like alpha
whose ABI's define the top bits too.
llvm-svn: 20926
do it. This results in better code on X86 for floats (because if strict
precision is not required, we can elide some more expensive double -> float
conversions like the old isel did), and allows other targets to emit
CopyFromRegs that are not legal for arguments.
llvm-svn: 19668
X86/reg-pressure.ll again, and allows us to do nice things in other cases.
For example, we now codegen this sort of thing:
int %loadload(int *%X, int* %Y) {
%Z = load int* %Y
%Y = load int* %X ;; load between %Z and store
%Q = add int %Z, 1
store int %Q, int* %Y
ret int %Y
}
Into this:
loadload:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%EAX]
mov %ECX, DWORD PTR [%ESP + 8]
inc DWORD PTR [%ECX]
ret
where we weren't able to form the 'inc [mem]' before. This also lets the
instruction selector emit loads in any order it wants to, which can be good
for register pressure as well.
llvm-svn: 19644
the basic block that uses them if possible. This is a big win on X86, as it
lets us fold the argument loads into instructions and reduce register pressure
(by not loading all of the arguments in the entry block).
For this (contrived to show the optimization) testcase:
int %argtest(int %A, int %B) {
%X = sub int 12345, %A
br label %L
L:
%Y = add int %X, %B
ret int %Y
}
we used to produce:
argtest:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, 12345
sub %EAX, %ECX
mov %EDX, DWORD PTR [%ESP + 8]
.LBBargtest_1: # L
add %EAX, %EDX
ret
now we produce:
argtest:
mov %EAX, 12345
sub %EAX, DWORD PTR [%ESP + 4]
.LBBargtest_1: # L
add %EAX, DWORD PTR [%ESP + 8]
ret
This also fixes the FIXME in the code.
BTW, this occurs in real code. 164.gzip shrinks from 8623 to 8608 lines of
.s file. The stack frame in huft_build shrinks from 1644->1628 bytes,
inflate_codes shrinks from 116->108 bytes, and inflate_block from 2620->2612,
due to fewer spills.
Take that alkis. :-)
llvm-svn: 19639