This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134047
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
Removed the check that peeks past EXTRA_SUBREG, which I don't think
makes sense any more. Intead treat it as a normal register def. No
significant affect on x86 or ARM benchmarks.
llvm-svn: 133917
Both become <earlyclobber> defs on the INLINEASM MachineInstr, but we
now use two different asm operand kinds.
The new Kind_Clobber is treated identically to the old
Kind_RegDefEarlyClobber for now, but x87 floating point stack inline
assembly does care about the difference.
This will pop a register off the stack:
asm("fstp %st" : : "t"(x) : "st");
While this will pop the input and push an output:
asm("fst %st" : "=&t"(r) : "t"(x));
We need to know if ST0 was a clobber or an output operand, and we can't
depend on <dead> flags for that.
llvm-svn: 133902
The INLINEASM MachineInstrs have an immediate operand describing each
original inline asm operand. Decode the bits in MachineInstr::print() so
it is easier to read:
INLINEASM <es:rorq $1,$0>, $0:[regdef], %vreg0<def>, %vreg1<def>, $1:[imm], 1, $2:[reguse] [tiedto:$0], %vreg2, %vreg3, $3:[regdef-ec], %EFLAGS<earlyclobber,imp-def>
llvm-svn: 133901
target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.
First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.
llvm-svn: 133782
register allocation if it has a indirectbr or if we can duplicate it to
every predecessor.
This fixes the SingleSource/Benchmarks/Shootout-C++/matrix.cpp regression but
keeps the previous improvements to sunspider.
llvm-svn: 133682
If the linker supports it, this will hold the CIE and FDE information in a
compact format. The implementation of the compact unwinding emission is coming
soon.
llvm-svn: 133658
be one with only one unconditional branch and no phis. Duplicating the phis in this case
is possible, but requeres liveness analysis or breaking edges.
llvm-svn: 133607
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
llvm-svn: 133503
* Don't introduce a duplicated bb in the CFG
* When making a branch unconditional, clear the PredCond array so that it
is really unconditional.
llvm-svn: 133432
dragonegg buildbots back to life. Original commit message:
Teach early dup how to duplicate basic blocks with one successor and only phi instructions
into more complex blocks.
llvm-svn: 133430
all over the place in different styles and variants. Standardize on two
preferred entrypoints: one that takes a StructType and ArrayRef, and one that
takes StructType and varargs.
In cases where there isn't a struct type convenient, we now add a
ConstantStruct::getAnon method (whose name will make more sense after a few
more patches land).
It would be "really really nice" if the ConstantStruct::get and
ConstantVector::get methods didn't make temporary std::vectors.
llvm-svn: 133412
The LSDA is a bit difficult for the non-initiated to read. Even with comments,
it's not always clear what's going on. This wraps the ASM streamer in a class
that retains the LSDA and then emits a human-readable description of what's
going on in it.
So instead of having to make sense of:
Lexception1:
.byte 255
.byte 155
.byte 168
.space 1
.byte 3
.byte 26
Lset0 = Ltmp7-Leh_func_begin1
.long Lset0
Lset1 = Ltmp812-Ltmp7
.long Lset1
Lset2 = Ltmp913-Leh_func_begin1
.long Lset2
.byte 3
Lset3 = Ltmp812-Leh_func_begin1
.long Lset3
Lset4 = Leh_func_end1-Ltmp812
.long Lset4
.long 0
.byte 0
.byte 1
.byte 0
.byte 2
.byte 125
.long __ZTIi@GOTPCREL+4
.long __ZTIPKc@GOTPCREL+4
you can read this instead:
## Exception Handling Table: Lexception1
## @LPStart Encoding: omit
## @TType Encoding: indirect pcrel sdata4
## @TType Base: 40 bytes
## @CallSite Encoding: udata4
## @Action Table Size: 26 bytes
## Action 1:
## A throw between Ltmp7 and Ltmp812 jumps to Ltmp913 on an exception.
## For type(s): __ZTIi@GOTPCREL+4 __ZTIPKc@GOTPCREL+4
## Action 2:
## A throw between Ltmp812 and Leh_func_end1 does not have a landing pad.
llvm-svn: 133286
* We should change the generated code because of a debug use.
* Avoid creating debug uses of undef, as they become a kill.
Test to follow.
llvm-svn: 133255
Also switch the return type to ArrayRef<unsigned> which works out nicely
for ARM's implementation of this function because of the clever ArrayRef
constructors.
The name change indicates that the returned allocation order may contain
reserved registers as has been the case for a while.
llvm-svn: 133216
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).
<rdar://problem/9624323>
llvm-svn: 133210
I think PBQP could use RegisterClassInfo, but it didn't fit neatly with
the external interfaces that PBQP uses, so I'll leave that to Lang.
llvm-svn: 133186
BranchProbabilityInfo (expect setEdgeWeight which is not available here).
Branch Weights are kept in MachineBasicBlocks. To turn off this analysis
set -use-mbpi=false.
llvm-svn: 133184
This is intended to support using REG_SEQUENCE SDNode's with type MVT::untyped, and is part of the long road to eliminating some of the hacks we currently use to support register pairs and other strange constraints, particularly on ARM NEON.
llvm-svn: 133178
This virtual function will replace allocation_order_begin/end as the one
to override when implementing custom allocation orders. It is simpler to
have one function return an ArrayRef than having two virtual functions
computing different ends of the same array.
Use getRawAllocationOrder() in place of allocation_order_begin() where
it makes sense, but leave some clients that look like they really want
the filtered allocation orders from RegisterClassInfo.
llvm-svn: 133170
GetDemandBits (which must operate on the vector element type).
Fix the a usage of getZeroExtendInReg which must also be done on scalar types.
llvm-svn: 133052
converted to add x,x if x is a undef. add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.
llvm-svn: 133022
Dan noted that this would work on the case shown on the commit message. I think
the case that was failing was a bb ending with a redundant conditional jump:
...
jne foo
foo:
...
I was unable to find any such case in the tests or in a debug build of clang,
so I will revert this part of the patch and watch the bots.
llvm-svn: 133004
types (with power of two types such as 8,16,32 .. 512).
Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).
Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.
llvm-svn: 132985
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.
The attached test case provides an example where the fast allocator
would spill a register when other registers are available.
llvm-svn: 132900
Instead of scalarizing, and doing an element-by-element truncat, use vector
truncate.
Add support for scalarization of vectors: i8 -> <1 x i1> (from Duncan's
testcase).
llvm-svn: 132892
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
llvm-svn: 132882
and definitions when emitting global variables. This was causing global
declarations to be emitted as if they were definitions.
Fixes <rdar://problem/9429892>.
llvm-svn: 132825
With this I am able to bootstrap clang with early tail duplication enabled
for any small bb and setting tail-dup-size to a relatively large value(8) to
stress this code.
llvm-svn: 132816
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.
llvm-svn: 132809
I've been sitting on this long enough trying to find a test case. I
think the fix should go in now, but I'll keep working on the test case.
llvm-svn: 132701
When local live range splitting creates a live range with the same
number of instructions as the old range, mark it as RS_Local. When such
a range is seen again, require that it be split in a way that reduces
the number of instructions. That guarantees we are making progress while
still being able to perform 3 -> 2+3 splits as required by PR10070.
This also means that the PrevSlot map is no longer needed. This was also
used to estimate new spill weights, but that is no longer necessary
after slotIndexes::insertMachineInstrInMaps() got the extra Late
insertion argument.
llvm-svn: 132697
Only target-dependent hints require callbacks. The RCI allocation order
has CSR aliases last according to their order of appearance in the
getCalleeSavedRegs list. This can depend on the calling convention.
This way, AllocationOrder::next doesn't have to check for reserved
registers, and CSRs are always allocated last, even with weird calling
conventions.
llvm-svn: 132690
The order of registers returned by getCalleeSavedRegs is used to lay out
the fixed stack slots for CSRs. Some targets like their CSRs used from
one end, and some targets want them used from the other end.
When computing an allocation order, simply preserve the relative
ordering of CSRs that the target specifies in its allocation order.
Reordering CSRs would break some targets, ARM in particular.
We still place volatiles before the CSRs, providing slightly better
results with different calling conventions.
llvm-svn: 132680
(only happens when using the -promote-elements option).
The correct legalization order is to first try to promote element. Next, we try
to widen vectors.
llvm-svn: 132648
of reserved registers.
Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.
llvm-svn: 132581
When compiling a program with lots of small functions like
483.xalancbmk, this makes RAFast 11% faster.
Add some comments to clarify the difference between unallocatable and
reserved registers. It's quite subtle.
The fast register allocator depends on EFLAGS' not being allocatable on
x86. That way it can completely avoid tracking liveness, and it won't
mind when there are multiple uses of a single def.
llvm-svn: 132514
Some register classes are only used for instruction operand constraints.
They should never be used for virtual registers. Previously, those
register classes were given an empty allocation order, but now you can
say 'let isAllocatable=0' in the register class definition.
TableGen calculates if a register is part of any allocatable register
class, and makes that information available in TargetRegisterDesc::inAllocatableClass.
The goal here is to eliminate use cases for overriding allocation_order_*
methods.
llvm-svn: 132508
I was confused whether new uint8_t[] would zero-initialize the returned
array, and it seems that so is gcc-4.0.
This should fix the test failures on darwin 9.
llvm-svn: 132500
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!
llvm-svn: 132483
register classes.
It provides information for each register class that cannot be
determined statically, like:
- The number of allocatable registers in a class after filtering out the
reserved and invalid registers.
- The preferred allocation order with registers that overlap callee-saved
registers last.
- The last callee-saved register that overlaps a given physical register.
This information usually doesn't change between functions, so it is
reused for compiling multiple functions when possible. The many
possible combinations of reserved and callee saves registers makes it
unfeasible to compute this information statically in TableGen.
Use RegisterClassInfo to count available registers in various heuristics
in SimpleRegisterCoalescing, making the pass run 4% faster.
llvm-svn: 132450
patch we add a flag to enable a new type legalization decision - to promote
integer elements in vectors. Currently, the rest of the codegen does not support
this kind of legalization. This flag will be removed when the transition is
complete.
llvm-svn: 132394
For targets with no itinerary (x86) it is a nop by default. For
targets with issue width already expressed in the itinerary (ARM) it
bypasses a scoreboard check but otherwise does not affect the
schedule. It does make the code more consistent and complete and
allows new targets to specify their issue width in an arbitrary way.
llvm-svn: 132385
turns out that it could cause an infinite loop in some situations. If this code
is triggered and it converts a cleanup into a catchall, but that cleanup was in
already in a cleanup, then the _Unwind_SjLj_Resume could infinite loop. I.e.,
the code doesn't consume the exception object and passes it on to
_Unwind_SjLj_Resume. But _USjLjR expects it to be consumed (since it's landing
at a catchall instead of a cleanup). So it uses the values that are presently
there, which are the values that tell it to jump to the fake landing pad.
<rdar://problem/9508402>
llvm-svn: 132381
When assigned ranges are evicted, they are put in the RS_Evicted stage and are
not allowed to evict anything else. That prevents looping automatically.
When evicting ranges just to get a cheaper register, use only spill weights to
find the possible candidates. Avoid breaking hints for this purpose, it is not
worth it.
Start implementing more complex eviction heuristics, guarded by the temporary
-complex-eviction flag. The initial version permits a heavier range to be
evicted if it doesn't have any uses where the evicting range is live. This makes
it a good candidate for live ranfge splitting.
llvm-svn: 132358
This only affects targets like Mips where branch instructions may kill virtual
registers. Most other targets branch on flag values, so virtual registers are
not involved.
The problem is that MachineBasicBlock::updateTerminator deletes branches and
inserts new ones while LiveVariables keeps a list of pointers to instructions
that kill virtual registers. That list wasn't properly updated in
MBB::SplitCriticalEdge.
llvm-svn: 132298
handler.
At this moment, only GCC-style exceptions are supported. Other kinds
of exceptions, including "traditional" SEH and Microsoft Visual C++ exceptions,
need more work--and an compiler exception model that isn't specific to
GCC-style exceptions!
In particular, I imagine that it would be possible to mix "traditional" SEH
with GCC-style EH or Microsoft C++ EH. Currently LLVM has no way (beyond some
target-specific defaults and whole-module compiler switches) of knowing which
scheme to use when.
llvm-svn: 132283
This patch does not change the behavior of the type legalizer. The codegen
produces the same code.
This infrastructural change is needed in order to enable complex decisions
for vector types (needed by the vector-select patch).
llvm-svn: 132263
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
Delete the Kill and Def markers in BlockInfo. They are no longer
necessary when BlockInfo describes a continuous live range.
This only affects the relatively rare kind of basic block where a live
range looks like this:
|---x o---|
Now live range splitting can pretend that it is looking at two blocks:
|---x
o---|
This allows the code to be simplified a bit.
llvm-svn: 132245
It is important that this function returns the same number of live blocks as
countLiveBlocks(CurLI) because live range splitting uses the number of live
blocks to ensure it is making progress.
This is in preparation of supporting duplicate UseBlock entries for basic blocks
that have a virtual register live-in and live-out, but not live-though.
llvm-svn: 132244
There was no way to check if a given register/mode pair was valid. We now return
an error code (-2) instead of asserting. If anyone thinks that an assert
at this point is really needed, we can autogen a hasValidDwarfRegNum instead.
llvm-svn: 132236
subregisters:
When a value is in a subregister, at least report the location as being
the superregister. We should extend the .td files to encode the bit
range so that we can produce a DW_OP_bit_piece.
llvm-svn: 132224
This doesn't change functionality (much), but it allows for a more fine-grained
eviction policy. The current policy only compares spill weights, and that is not
always the best thing to do. Spill weights are designed to serve linear scan,
and they don't consider live range splitting.
Add a mechanism so canEvict() can request that a live range be evicted and
split/spilled. This is to avoid infinite eviction loops.
llvm-svn: 132101
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).
rdar://9437928 .
llvm-svn: 132099
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948
on CodeGen/X86/2007-05-07-InvokeSRet.ll. There is probably a bug here that was
fixed by r128961, but since there is no test or reference to a source file I have
to revert it.
llvm-svn: 131618
LiveInterval::shrinkToUses recomputes the live range from scratch instead of
removing snippets. This should avoid the problem with dangling live ranges.
Leave physreg identity copies alone. They can be created when joining a virtreg
with a physreg. They don't affect register allocation, and they will be removed
by the rewriter.
llvm-svn: 131521
The greedy register allocator has live range splitting and register class
inflation, so it can actually fully undo this join, including restoring the
original register classes.
We still don't want to do this for long live ranges, mostly because of the high
register pressure of there are many constrained live ranges overlapping.
llvm-svn: 131466
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.
This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic. Prioritizing
register allocation according to spill weight can cause more registers to be
used.
llvm-svn: 131436
by non-CMP expressions. The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester. Alas, the ARM assembly would be very difficult to check with
FileCheck.
The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790
llvm-svn: 131261
about to be spilled.
This can only happen when two extra snippet registers are included in the spill,
and there is a copy between them. Hoisting the spill creates problems because
the hoist will mark the copy for later dead code elimination, and spilling the
second register will turn the copy into a spill.
<rdar://problem/9420853>
llvm-svn: 131192
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.
llvm-svn: 131183
intrinsic call. This prevents it from being reordered so that it appears
*before* the setjmp intrinsic (thus making it completely useless).
<rdar://problem/9409683>
llvm-svn: 131174
at the start of basic blocks to their common predecessor. It's actually quite
common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size
benefit. e.g.
pushq %rax
testl %edi, %edi
jne LBB0_2
## BB#1:
xorb %al, %al
popq %rdx
ret
LBB0_2:
xorb %al, %al
callq _foo
popq %rdx
ret
=>
pushq %rax
xorb %al, %al
testl %edi, %edi
je LBB0_2
## BB#1:
callq _foo
LBB0_2:
popq %rdx
ret
rdar://9145558
llvm-svn: 131172
this clang will use .debug_frame in, for example,
clang -g -c -m32 test.c
This matches gcc's behaviour. It looks like .debug_frame is a bit bigger
than .eh_frame, but has the big advantage of not being allocated.
llvm-svn: 131140
It can happen that a live debug variable is the last use of a sub-register, and
the register allocator will pick a larger register class for the virtual
register. If the allocated register doesn't support the sub-register index,
just use %noreg for the debug variables instead of asserting.
In PR9872, a debug variable ends up in the sub_8bit_hi part of a GR32_ABCD
register. The register is split and one part is inflated to GR32 and assigned
%ESI because there are no more normal uses of sub_8bit_hi.
Since %ESI doesn't have that sub-register, substPhysReg asserted. Now it will
simply insert a %noreg instead, and the debug variable will be marked
unavailable in that range.
We don't currently have a way of saying: !"value" is in bits 8-15 of %ESI, I
don't know if DWARF even supports that.
llvm-svn: 131073
This can't be just an assertion, users can always write impossible inline
assembly. Such an assembly statement should be included in the error message.
llvm-svn: 131024
After a virtual register is split, update any debug user variables that resided
in the old register. This ensures that the LiveDebugVariables are still correct
after register allocation.
This may create DBG_VALUE instructions that place a user variable in a register
in parts of the function and in a stack slot in other parts. DwarfDebug
currently doesn't support that.
llvm-svn: 130998
The post-ra scheduler was explicitly updating the depth of a node's
successors after scheduling it, regardless of whether the successor
was ready. This is quadratic for DAGs with transitively redundant
edges. I simply removed the useless update of depth, which is lazilly
computed later.
Fixes <rdar://problem/9044332> compiler takes way too long to build TextInput.
llvm-svn: 130992
BuildSchedGraph was quadratic in the number of calls in the basic
block. After this fix, it keeps only a single call at the top of the
DefList so compile time doesn't blow up on large blocks. This reduces
postRA sched time on an external test case from 81s to 0.3s. Although
r130800 (reduced ARM register alias defs) also partially fixes the
issue by reducing the constant overhead of checking call interference
by an order of magnitude.
Fixes <rdar://problem/7662664> very poor compile time with post RA scheduling.
llvm-svn: 130943
who used this flag, and it now emits CFI and doesn't emit this anymore. All
other targets left this flag "false".
<rdar://problem/8486371>
llvm-svn: 130918
Joining physregs is inherently dangerous because it uses a heuristic to avoid
creating invalid code. Linear scan had an emergency spilling mechanism to deal
with those rare cases. The new greedy allocator does not.
The greedy register allocator is much better at taking hints, so this has almost
no impact on code size and quality. The few cases where it matters show up as
unit tests that now have -join-physregs enabled explicitly.
llvm-svn: 130896
landing pad as its successor.
SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>
llvm-svn: 130881
Original message:
Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130877
it is both inefficient and unexpected by dwarfdump. Change to
a DW_FORM_data4.
While in here, change the predicate name to reflect that the position
is not really absolute (it is an offset), just that the linker needs a
relocation.
llvm-svn: 130846
Register coalescing can sometimes create live ranges that end in the middle of a
basic block without any killing instruction. When SplitKit detects this, it will
repair the live range by shrinking it to its uses.
Live range splitting also needs to know about this. When the range shrinks so
much that it becomes allocatable, live range splitting fails because it can't
find a good split point. It is paranoid about making progress, so an allocatable
range is considered an error.
The coalescer should really not be creating these bad live ranges. They appear
when coalescing dead copies.
llvm-svn: 130787
Def operands may also have an <undef> flag, but that just means that a
sub-register redef doesn't actually read the super-register. For physical
registers, it has no meaning.
llvm-svn: 130714
This works around a limitation in gdb which is reported by following inherit.exp test failures from gdb testsuite.
gdb.cp/inherit.exp: print g_vB.vB::vb
gdb.cp/inherit.exp: print g_vB.vB::vx
gdb.cp/inherit.exp: print g_vC.vC::vc
gdb.cp/inherit.exp: print g_vC.vC::vx
gdb.cp/inherit.exp: print g_vD.vB::vb
...
llvm-svn: 130702
When an interfering live range ends at a dead slot index between two
instructions, make sure that the inserted copy instruction gets a slot index
after the dead ones. This makes it possible to avoid the interference.
Ideally, there shouldn't be interference ending at a deleted instruction, but
physical register coalescing can sometimes do that to sub-registers.
This fixes PR9823.
llvm-svn: 130687
give it a bit more responsibility. Also implement it for MachO.
If hacked to use cfi, 32 bit MachO will produce
.cfi_personality 155, L___gxx_personality_v0$non_lazy_ptr
and 64 bit will produce
.cfi_presonality ___gxx_personality_v0
The general idea is that .cfi_personality gets passed the final symbol. It is
up to codegen to produce it if using indirect representation (like 32 bit
MachO), but it is up to MC to decide which relocations to create.
llvm-svn: 130341
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.
rdar://9344645
llvm-svn: 130324
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.
Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.
rdar://9329627
llvm-svn: 130245
This has two effects: 1. We never inflate to a larger register class than what
the sub-target can handle. 2. Completely unconstrained virtual registers get the
largest possible register class.
llvm-svn: 130229
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
would cause an assert.
2. The load may not be used by the current chain of instructions,
and we could move it past a side-effecting instruction. Change
how we process uses to define the problem away.
llvm-svn: 130018
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
movzbl (%rdi), %eax
cmpl $47, %eax
->
cmpb $47, (%rdi)
This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)
llvm-svn: 130005
An exception is thrown via a call to _cxa_throw, which we don't expect to
return. Therefore, the "true" part of the invoke goes to a BB that has
'unreachable' as its only instruction. This is lowered into an empty MachineBB.
The landing pad for this invoke, however, is directly after the "true" MBB.
When the empty MBB is removed, the landing pad is directly below the BB with the
invoke call. The unconditional branch is removed and then the two blocks are
merged together.
The testcase is too big for a regression test.
<rdar://problem/9305728>
llvm-svn: 129965
These intervals are allocatable immediately after splitting, but they may be
evicted because of later splitting. This is rare, but when it happens they
should be split again.
The remainder intervals that cannot be allocated after splitting still move
directly to spilling.
SplitEditor::finish can optionally provide a mapping from new live intervals
back to the original interval indexes returned by openIntv().
Each original interval index can map to multiple new intervals after connected
components have been separated. Dead code elimination may also add existing
intervals to the list.
The reverse mapping allows the SplitEditor client to treat the new intervals
differently depending on the split region they came from.
llvm-svn: 129925
TII::isTriviallyReMaterializable() shouldn't depend on any properties of the
register being defined by the instruction. Rematerialization is going to create
a new virtual register anyway.
llvm-svn: 129882
On the x86-64 and thumb2 targets, some registers are more expensive to encode
than others in the same register class.
Add a CostPerUse field to the TableGen register description, and make it
available from TRI->getCostPerUse. This represents the cost of a REX prefix or a
32-bit instruction encoding required by choosing a high register.
Teach the greedy register allocator to prefer cheap registers for busy live
ranges (as indicated by spill weight).
llvm-svn: 129864
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).
Fixes rdar://9292577
llvm-svn: 129842
- There is a minor semantic change here (evidenced by the test change) for
Darwin triples that have no version component. I debated changing the default
behavior of isOSVersionLT, but decided it made more sense for triples to be
explicit.
llvm-svn: 129802
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.
This is currently disabled by default. We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.
llvm-svn: 129772
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.
Fixes rdar://9207598
llvm-svn: 129711
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
llvm-svn: 129666
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
The transferValues() function can now handle both singly and multiply defined
values, as long as the resulting live range is known. Only rematerialized values
have their live range recomputed by extendRange().
The updateSSA() function can now insert PHI values in bulk across multiple
values in multiple target registers in one pass. The list of blocks received
from transferValues() is in layout order which seems to work well for the
iterative algorithm. Blocks from extendRange() are still in reverse BFS order,
but this function is used so rarely now that it doesn't matter.
llvm-svn: 129580
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129571
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
llvm-svn: 129508
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129383
This merges the behavior of splitSingleBlocks into splitAroundRegion, so the
RS_Region and RS_Block register stages can be coalesced. That means the leftover
intervals after region splitting go directly to spilling instead of a second
pass of per-block splitting.
llvm-svn: 129379
mean that it has to be ConstantArray of ConstantStruct. We might have
ConstantAggregateZero, at either level, so don't crash on that.
Also, semi-deprecate the sentinal value. The linker isn't aware of sentinals so
we end up with the two lists appended, each with their "sentinals" on them.
Different parts of LLVM treated sentinals differently, so make them all just
ignore the single entry and continue on with the rest of the list.
llvm-svn: 129307
the 'unwind' instruction. However, later on that instruction was converted into
a jump to the basic block it was located in, causing an infinite loop when we
get there.
It turns out, we get there if the _Unwind_Resume_or_Rethrow call returns (which
it's not supposed to do). It returns if it cannot find a place to unwind
to. Thus we would get what appears to be a "hang" when in reality it's just that
the EH couldn't be propagated further along.
Instead of infinitely looping (or calling `unwind', which none of our back-ends
support (it's lowered into nothing...)), call the @llvm.trap() intrinsic
instead. This may not conform to specific rules of a particular language, but
it's rather better than infinitely looping.
<rdar://problem/9175843&9233582>
llvm-svn: 129302
Both coalescing and register allocation already check aliases for interference,
so these extra segments are only slowing us down.
This speeds up both linear scan and the greedy register allocator.
llvm-svn: 129283
It is common for large live ranges to have few basic blocks with register uses
and many live-through blocks without any uses. This approach grows the Hopfield
network incrementally around the use blocks, completely avoiding checking
interference for some through blocks.
llvm-svn: 129188
If lower bound is more then upper bound then consider it is an unbounded array.
An array is unbounded if non-zero lower bound is same as upper bound.
If lower bound and upper bound are zero than array has one element.
llvm-svn: 129156
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.
Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing
llvm-svn: 129100
About 90% of the relevant blocks are live-through without uses, and the only
information required about them is their number. This saves memory and enables
later optimizations that need to look at only the use-blocks.
llvm-svn: 128985
There can be multiple defs for a single virtual register when they are defining
sub-registers.
The missing <dead> flag was stopping the inline spiller from eliminating dead
code after rematerialization.
llvm-svn: 128888
This allows us to always keep the smaller slot for an instruction which is what
we want when a register has early clobber defines.
Drop the UsingInstrs set and the UsingBlocks map. They are no longer needed.
llvm-svn: 128886
inlined path for the common case.
Most basic blocks don't contain a call that may throw, so the last split point
os simply the first terminator.
llvm-svn: 128874
It needed to be moved closer to the setjmp statement, because the code directly
after the setjmp needs to know about values that are on the stack. Also, the
'bitcast' of the function context was causing a dead load. This wouldn't be too
horrible, except that at -O0 it wasn't optimized out, and because it wasn't
using the correct base pointer (if there is a VLA), it would try to access a
value from a garbage address.
<rdar://problem/9130540>
llvm-svn: 128873
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:
%vreg75<def> = COPY %ESP; GR32:%vreg75
MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
CALLpcrel32 ...
Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.
The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.
I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.
llvm-svn: 128845
When the greedy register allocator is splitting multiple global live ranges, it
tends to look at the same interference data many times. The InterferenceCache
class caches queries for unaltered LiveIntervalUnions.
llvm-svn: 128764
transformations in target-specific DAG combines without causing DAGCombiner to
delete the same node twice. If you know of a better way to avoid this (see my
next patch for an example), please let me know.
llvm-svn: 128758
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.
Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.
llvm-svn: 128642
We don't expect the real "powf()" on some hosts (and powf() would be available on other hosts).
For consistency, std::pow(double,double) may be called instead.
Or, precision issue might attack us, to see unstable regalloc and stack coloring.
llvm-svn: 128629
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.
The register class is reinflated after spilling, so the conservative choice is
only temporary.
llvm-svn: 128610
The rewriter can keep track of multiple stack slots in the same register if they
happen to have the same value. When an instruction modifies a stack slot by
defining a register that is mapped to a stack slot, other stack slots in that
register are no longer valid.
This is a very rare problem, and I don't have a simple test case. I get the
impression that VirtRegRewriter knows it is about to be deleted, inventing a
last opaque problem.
<rdar://problem/9204040>
llvm-svn: 128562
When DCE clones a live range because it separates into connected components,
make sure that the clones enter the same register allocator stage as the
register they were cloned from.
For instance, clones may be split even when they where created during spilling.
Other registers created during spilling are not candidates for splitting or even
(re-)spilling.
llvm-svn: 128524
The instruction to be rematerialized may not be the one defining the register
that is being spilled. The traceSiblingValue() function sees through sibling
copies to find the remat candidate.
llvm-svn: 128449
The reassignment phase was able to move interference with a higher spill weight,
but it didn't happen very often and it was fairly expensive.
The existing interference eviction picks up the slack.
llvm-svn: 128397
The main register class may have been inflated by live range splitting, so that
register class is not necessarily valid for the snippet instructions.
Use the original register class for the stack slot interval.
llvm-svn: 128351
It couldn't be used outside of the file because SDISelAsmOperandInfo
is local to SelectionDAGBuilder.cpp. Making it a static function avoids
a weird linkage dance.
llvm-svn: 128342
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.
The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.
llvm-svn: 128327
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.
Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.
llvm-svn: 128284
so the scheduler can't create new interferences on the copies
themselves. Prior to this fix the scheduler could get stuck in a loop
creating copies.
Fixes PR9509.
llvm-svn: 128164
Each of these instructions may have a RegsClobberInsn entry that can't be
ignored. Consecutive ranges are coalesced later when DwarfDebug::emitDebugLoc
merges entries.
llvm-svn: 128155
I'm tired of doing this manually for each checkout.
If anyone knows a better way debug isel for non-trivial tests feel
free to revert and let me know how to do it.
llvm-svn: 128132
This will extend the ranges of debug info variables in registers until they are
clobbered.
Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.
Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.
llvm-svn: 128114
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.
Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.
llvm-svn: 128045
the alias of an InstAlias instead of the thing being aliased. Because we need to
know the features that are valid for an InstAlias.
This is part of a work-in-progress.
llvm-svn: 127986
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
Proof-of-concept code that code-gens a module to an in-memory MachO object.
This will be hooked up to a run-time dynamic linker library (see: llvm-rtdyld
for similarly conceptual work for that part) which will take the compiled
object and link it together with the rest of the system, providing back to the
JIT a table of available symbols which will be used to respond to the
getPointerTo*() queries.
llvm-svn: 127916
The llvm.dbg.value intrinsic refers to SSA values, not virtual registers, so we
should be able to extend the range of a value by tracking that value through
register copies. This greatly improves the debug value tracking for function
arguments that for some reason are copied to a second virtual register at the
end of the entry block.
We only extend the debug value range where its register is killed. All original
llvm.dbg.value locations are still respected.
Copies from physical registers are ignored. That should not be a problem since
the entry block already adds DBG_VALUE instructions for the virtual registers
holding the function arguments.
llvm-svn: 127912
Stack slot real estate is virtually free compared to registers, so it is
advantageous to spill earlier even though the same value is now kept in both a
register and a stack slot.
Also eliminate redundant spills by extending the stack slot live range
underneath reloaded registers.
This can trigger a dead code elimination, removing copies and even reloads that
were only feeding spills.
llvm-svn: 127868
I have convinced myself that it can only happen when a phi value dies. When it
happens, allocate new virtual registers for the components.
llvm-svn: 127827
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
After live range splitting, an original value may be available in multiple
registers. Tracing back through the registers containing the same value, find
the best place to insert a spill, determine if the value has already been
spilled, or discover a reaching def that may be rematerialized.
This is only the analysis part. The information is not used for anything yet.
llvm-svn: 127698
v2 = bitcast v1
...
v3 = bitcast v2
...
= v3
=>
v2 = bitcast v1
...
= v1
if v1 and v3 are of in the same register class.
bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.
rdar://9104514
llvm-svn: 127668
and then go kablooie. The problem was that it was tracking the PHI nodes anew
each time into this function. But it didn't need to. And because the recursion
didn't know that a PHINode was visited before, it would go ahead and call
itself.
There is a testcase, but unfortunately it's too big to add. This problem will go
away with the EH rewrite.
<rdar://problem/8856298>
llvm-svn: 127640
Remove the unused reserved_ bit vector, no functional change intended.
This doesn't break 'svn blame', this file really is all my fault.
llvm-svn: 127607
llvm-gcc-i386-linux-selfhost and llvm-x86_64-linux-checks buildbots.
The original log entry:
Remove optimization emitting a reference insted of label difference, since
it can create more relocations. Removed isBaseAddressKnownZero method,
because it is no longer used.
llvm-svn: 127540
Live range splitting can create a number of small live ranges containing only a
single real use. Spill these small live ranges along with the large range they
are connected to with copies. This enables memory operand folding and maximizes
the spill to fill distance.
Work in progress with known bugs.
llvm-svn: 127529
There are too many compatibility problems with using mixed types in
std::upper_bound, and I don't want to spend 110 lines of boilerplate setting up
a call to a 10-line function. Binary search is not /that/ hard to implement
correctly.
I tried terminating the binary search with a linear search, but that actually
made the algorithm slower against my expectation. Most live intervals have less
than 4 segments. The early test against endIndex() does pay, and this version is
25% faster than plain std::upper_bound().
llvm-svn: 127522
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
llvm-svn: 127497
The existing CompEnd predicate does not define a strict weak order as required
by the C++03 standard; therefore, its use as a predicate to std::upper_bound
is invalid. For a discussion of this issue, see
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#270
This patch replaces the asymmetrical comparison with an iterator adaptor that
achieves the same effect while being strictly standard-conforming by ensuring
an apples-to-apples comparison.
llvm-svn: 127462
flexible.
If it returns a register class that's different from the input, then that's the
register class used for cross-register class copies.
If it returns a register class that's the same as the input, then no cross-
register class copies are needed (normal copies would do).
If it returns null, then it's not at all possible to copy registers of the
specified register class.
llvm-svn: 127368
The damage done by physreg coalescing only depends on the number of instructions
the extended physreg live range covers. This fixes PR9438.
The heuristic is still luck-based, and physreg coalescing really should be
disabled completely. We need a register allocator with better hinting support
before that is possible.
Convert a test to FileCheck and force spilling by inserting an extra call. The
previous spilling behavior was dependent on misguided physreg coalescing
decisions.
llvm-svn: 127351
LiveRangeEdit::eliminateDeadDefs() will eventually be used by coalescing,
splitting, and spilling for dead code elimination. It can delete chains of dead
instructions as long as there are no dependency loops.
llvm-svn: 127287
with this before since none of the register tracking or nightly tests
had unschedulable nodes.
This should probably be refixed with a special default Node that just
returns some "don't touch me" values.
Fixes PR9427
llvm-svn: 127263
The coalescer can in very rare cases leave too large live intervals around after
rematerializing cheap-as-a-move instructions.
Linear scan doesn't really care, but live range splitting gets very confused
when a live range is killed by a ghost instruction.
I will fix this properly in the coalescer after 2.9 branches.
llvm-svn: 127096
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
llvm-svn: 127067
This simplifies the code and makes it faster too.
The interference patterns are saved for each candidate register. It will be
reused for actually executing the split. Work in progress.
llvm-svn: 127054
Initially, slot indexes are quad-spaced. There is room for inserting up to 3
new instructions between the original instructions.
When we run out of indexes between two instructions, renumber locally using
double-spaced indexes. The original quad-spacing means that we catch up quickly,
and we only have to renumber a handful of instructions to get a monotonic
sequence. This is much faster than renumbering the whole function as we did
before.
llvm-svn: 127023
You can't really predict how many indexes will be needed from the number of
defs, so let's keep it simple.
Also remove an extra empty index that was inserted after each basic block. It
was intended for live-out ranges, but it was never used that way.
llvm-svn: 127014
type after type legalization has completed. Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).
llvm-svn: 127000
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
llvm-svn: 126981
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.
Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.
Testcase ObjC/exceptions-4.m in r126968.
llvm-svn: 126969
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
llvm-svn: 126926
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
llvm-svn: 126925
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.
llvm-svn: 126924
We need to wait until we meet a PHIDef in its defining block before resurrecting
PHIKills in the predecessors.
This should unbreak the llvm-gcc-build-x86_64-darwin10-x-mingw32-x-armeabi bot.
llvm-svn: 126905
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG. Unfortunately
this blows up exponentially in some cases. The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes. Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.
llvm-svn: 126899
Values that map to a single new value in a new interval after splitting don't
need new PHIDefs, and if the parent value was never rematerialized the live
range will be the same.
llvm-svn: 126894
Extract the updateSSA() method from the too long extendRange().
LiveOutCache can be shared among all the new intervals since there is at most
one of the new ranges live out from each basic block.
llvm-svn: 126818
This method could probably be used by LiveIntervalAnalysis::shrinkToUses, and
now it can use extendIntervalEndTo() which coalesces ranges.
llvm-svn: 126803
The value map is currently not used, all values are 'complex mapped' and
LiveIntervalMap::mapValue is used to dig them out.
This is the first step in a series changes leading to the removal of
LiveIntervalMap. Its data structures can be shared among all the live intervals
created by a split, so it is wasteful to create a copy for each.
llvm-svn: 126800
This effectively disables the 'turbo' functionality of the greedy register
allocator where all new live ranges created by splitting would be reconsidered
as if they were originals.
There are two reasons for doing this, 1. It guarantees that the algorithm
terminates. Early versions were prone to infinite looping in certain corner
cases. 2. It is a 2x speedup. We can skip a lot of unnecessary interference
checks that won't lead to good splitting anyway.
The problem is that region splitting only gets one shot, so it should probably
be changed to target multiple physical registers at once.
Local live range splitting is still 'turbo' enabled. It only accounts for a
small fraction of compile time, so it is probably not necessary to do anything
about that.
llvm-svn: 126781
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
llvm-svn: 126557
is possible to do better if the high bit is set in either KnownZero/KnownOne, but
in practice NumSignBits is always 1 when we are zero extending because nothing
is known about that register.
llvm-svn: 126465
New live ranges are assigned in long -> short order, but live ranges that have
been evicted at least once are deferred and assigned in short -> long order.
Also disable splitting and spilling for live ranges seen for the first time.
The intention is to create a realistic interference pattern from the heavy live
ranges before starting splitting and spilling around it.
llvm-svn: 126451
Limit the folding of any_ext and sext into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.
Similar to commit 126080 (for enabling zext).
llvm-svn: 126424
The problem was codegen guessing the wrong values and printing
.section .eh_frame,"aMS",@progbits,4
It is not clear at all if Codegen should try to guess, MC is the
one that should know the default flags.
llvm-svn: 126421
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.
llvm-svn: 126380
When a large live range is evicted, it will usually be split when it comes
around again. By deferring evicted live ranges, the splitting happens at a time
when the interference pattern is more realistic. This prevents repeated
splitting and evictions.
llvm-svn: 126282
Use interval sizes instead of spill weights to determine if it is legal to evict
interference. A smaller interval can evict interference if all interfering live
ranges are larger.
Allow multiple interferences to be evicted as along as they are all larger than
the live range being allocated.
Spill weights are still used to select the preferred eviction candidate.
llvm-svn: 126276
This is based on the observation that long live ranges are more difficult to
allocate, so there is a better chance of solving the puzzle by handling the big
pieces first. The allocator will evict and split long alive ranges when they get
in the way.
RABasic is still using spill weights for its priority queue, so the interface to
the queue has been virtualized.
llvm-svn: 126259
share entries. Add a DenseSet to MachineConstantPool for the MachineCPVs that
it owns.
This will hopefully fix the MC/ARM/elf-reloc-01.ll failure on the leaks bots.
llvm-svn: 126218
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working.
- The debugger needs to be aware of prolog_end attribute attached with line table entries.
- The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)
llvm-svn: 126155
An original endpoint is an instruction that killed or defined the original live
range before any live ranges were split.
When splitting global live ranges, avoid creating local live ranges without any
original endpoints. We may still create global live ranges without original
endpoints, but such a range won't be split again, and live range splitting still
terminates.
llvm-svn: 126151
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.
llvm-svn: 126080
The rewriter works almost identically to -rewriter=trivial, except it also
eliminates any identity copies.
This makes the new register allocators independent of VirtRegRewriter.cpp which
will be going away at the same time as RegAllocLinearScan.
llvm-svn: 125967
A local live range is live in a single basic block. If such a range fails to
allocate, try to find a sub-range that would get a larger spill weight than its
interference.
llvm-svn: 125764
the time but presumably my email got lost). Examples where the previous logic
got it wrong: (1) a signed i8 multiply of 64 by 2 overflows, but the high part is
zero; (2) a signed i8 multiple of -128 by 2 overflows, but the high part is all
ones.
llvm-svn: 125748
transformation if we can't legally create a build vector of the correct
type. Check that we can make the transformation first, and add a TODO to
refactor this code with similar cases.
Fixes: PR9223 and rdar://9000350
llvm-svn: 125631
Machine instruction range consisting of only DBG_VALUE MIs only contributes consecutive labels in assembly output, which is harmless, and empty scope entry in DebugInfo, which confuses debugger tools.
llvm-svn: 125577
Simplify the spill weight calculation a bit by bypassing
getApproximateInstructionCount() and using LiveInterval::getSize() directly.
This changes the computed spill weights, but only by a constant factor in each
function. It should not affect how spill weights compare against each other, and
so it shouldn't affect code generation.
llvm-svn: 125530
have their low bits set to zero. This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.
Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively. Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST). The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.
llvm-svn: 125470
generating i8 shift amounts for things like i1024 types. Add
an assert in getNode to prevent this from occuring in the future,
fix the buggy transformation, revert my previous patch, and
document this gotcha in ISDOpcodes.h
llvm-svn: 125465
The DAGCombiner created illegal BUILD_VECTOR operations.
The patch added a check that either illegal operations are
allowed or that the created operation is legal.
llvm-svn: 125435
The bug happens when the DAGCombiner attempts to optimize one of the patterns
of the SUB opcode. It tries to create a zero of type v2i64. This type is legal
on 32bit machines, but the initializer of this vector (i64) is target dependent.
Currently, the initializer attempts to create an i64 zero constant, which fails.
Added a flag to tell the DAGCombiner to create a legal zero, if we require that
the pass would generate legal types.
llvm-svn: 125391
Registers are not allocated strictly in spill weight order when live range
splitting and spilling has created new shorter intervals with higher spill
weights.
When one of the new heavy intervals conflicts with a single lighter interval,
simply evict the old interval instead of trying to split the heavy one.
The lighter interval is a better candidate for splitting, it has a smaller use
density.
llvm-svn: 125151
If a live range is used by a terminator instruction, and that live range needs
to leave the block on the stack or in a different register, it can be necessary
to have both sides of the split live at the terminator instruction.
Example:
%vreg2 = COPY %vreg1
JMP %vreg1
Becomes after spilling %vreg2:
SPILL %vreg1
JMP %vreg1
The spill doesn't kill the register as is normally the case.
llvm-svn: 125102
Avoid using the same register for two def operands or and earlyclobber
def and use operand. This fixes PR8986 and improves on the prior fix
for rdar://problem/8959122.
llvm-svn: 125089
After uses of a live range are removed, recompute the live range to only cover
the remaining uses. This is necessary after rematerializing the value before
some (but not all) uses.
llvm-svn: 125058
<rdar://problem/8959122> illegal register operands for UMULL instruction in cfrac nightly test
I'm stil working on a unit test, but the case is:
rx = movcc rx, r3
r2 = ldr
r2, r3 = umull r2, r2
The anti-dep breaker should not convert this into an illegal instruction:
r2, r2 = umull
llvm-svn: 124932