Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
We were never generating any of these nodes with variable indices, and there
was one legalizer function asserting on a non-constant index. If we ever have
a need to support variable indices, we can add this back again.
llvm-svn: 122993
This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.
Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.
The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.
This is a work in progress and still not hooked up to RegAllocGreedy.
llvm-svn: 122938
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501
llvm-svn: 122936
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
llvm-svn: 122832
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
llvm-svn: 122710
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
llvm-svn: 122707
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:
1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.
2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.
3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.
I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.
I also haven't yet run any tests of the quality of the generated code.
llvm-svn: 122582
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.
llvm-svn: 122581
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
llvm-svn: 122472
loads properly. We miscompiled the testcase into:
_test: ## @test
movl $128, (%rdi)
movzbl 1(%rdi), %eax
ret
Now we get a proper:
_test: ## @test
movl $128, (%rdi)
movsbl (%rdi), %eax
movzbl %ah, %eax
ret
This fixes PR8757.
llvm-svn: 122392
count operand. These should be the same but apparently are
not always, and this is cleaner anyway. This improves the
code in an existing test.
llvm-svn: 122354
of the problems with my last attempt were in the updating of LiveIntervals
rather than the coalescing itself. Therefore, I decided to get that right first
by essentially reimplementing the existing PHIElimination using LiveIntervals.
It works correctly, with only a few tests failing (which may not be legitimate
failures) and no new verifier failures (at least as far as I can tell, I didn't
count the number per file).
llvm-svn: 122321
Edge bundles is an annotation on the CFG that turns it into a bipartite directed
graph where each basic block is connected to an outgoing and an ingoing bundle.
These bundles are useful for identifying regions of the CFG for live range
splitting.
llvm-svn: 122301
ARM (and other 32-bit-only) targets support for i8 and i16 overflow
multiplies. The generated code isn't great, but this at least fixes
CodeGen/Generic/overflow.ll when running on ARM hosts.
llvm-svn: 122221
Imagine we see:
EFLAGS = inst1
EFLAGS = inst2 FLAGS
gpr = inst3 EFLAGS
Previously, we would refuse to schedule inst2 because it clobbers
the EFLAGS of the predecessor. However, it also uses the EFLAGS
of the predecessor, so it is safe to emit. SDep edges ensure that
the right order happens already anyway.
This fixes 2 testsuite crashes with the X86 patch I'm going to
commit next.
llvm-svn: 122211
alternative register allocator that does not require LiveIntervals by specifying
it on the command-line for a target that has StrongPHIElimination enabled by
default.
These checks are pretty meaningless anyways, since StrongPHIElimination and
PHIElimination are never used at the same time.
llvm-svn: 122176
use before rematerializing the load.
This allows us to produce:
addps LCPI0_1(%rip), %xmm2
Instead of:
movaps LCPI0_1(%rip), %xmm3
addps %xmm3, %xmm2
Saving a register and an instruction. The standard spiller already knows how to
do this.
llvm-svn: 122133
the loop predecessors.
The register can be live-out from a predecessor without being live-in to the
loop header if there is a critical edge from the predecessor.
llvm-svn: 122123
createMachineVerifierPass and MachineFunction::verify.
The banner is printed before the machine code dump, just like the printer pass.
llvm-svn: 122113
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.
Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)
<rdar://problem/8782198>
llvm-svn: 122104
BUILD_VECTOR operands where the element type is not legal. I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.
llvm-svn: 122102
the operand uses the same register as a tied operand:
%r1 = add %r1, %r1
If add were a three-address instruction, kill flags would be required on at
least one of the uses. Since it is a two-address instruction, the tied use
operand must not have a kill flag.
This change makes the kill flag on the untied use operand optional.
llvm-svn: 122082
This is a three-way interval list intersection between a virtual register, a
live interval union, and a loop. It will be used to identify interference-free
loops for live range splitting.
llvm-svn: 122034
A MachineLoopRange contains the intervals of slot indexes covered by the blocks
in a loop. This representation of the loop blocks is more efficient to compare
against interfering registers during register coalescing.
llvm-svn: 121917
Bypass loops have the current live range live through, but contain no uses or
defs. Splitting around a bypass loop can free registers for other uses inside
the loop by spilling the split range.
llvm-svn: 121871
regB = move RCX
regA = op regB, regC
RAX = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
movl %edi, %eax
shlq $32, %rcx
leaq (%rcx,%rax), %rax
=>
movl %edi, %eax
shlq $32, %rcx
orq %rcx, %rax
rdar://8762995
llvm-svn: 121793
when the wider type is legal. This allows us to compile:
define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
%div = udiv i16 %x, 33
ret i16 %div
}
into:
test1: # @test1
movzwl 4(%esp), %eax
imull $63551, %eax, %eax # imm = 0xF83F
shrl $21, %eax
ret
instead of:
test1: # @test1
movw $-1985, %ax # imm = 0xFFFFFFFFFFFFF83F
mulw 4(%esp)
andl $65504, %edx # imm = 0xFFE0
movl %edx, %eax
shrl $5, %eax
ret
Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320
We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.
llvm-svn: 121696
for each constant pool entry. Using WriteTypeSymbolic here takes time
proportional to the size of the module, for each constant pool entry.
This speeds up -verbose-asm llc on 252.eon (a random testcase at my disposal)
from 4.4s to 2.137s. llc takes 2.11s with asm-verbose off, so this is now a
pretty reasonable cost for verbose comments.
llvm-svn: 121691
The spiller should only spill. The register allocator will drive live range
splitting, it has the needed information about register pressure and
interferences.
llvm-svn: 121590
registers for a given virtual register.
Reserved registers are filtered from the allocation order, and any valid hint is
returned as the first suggestion.
For target dependent hints, a number of arcane target hooks are invoked.
llvm-svn: 121497
references instead.
Similarly, IntervalMap::begin() is almost as expensive as find(), so use find(x)
instead of begin().advanceTo(x);
This makes RegAllocBasic run another 5% faster.
llvm-svn: 121344
abstract priority queue interface in subclasses that want to override the
priority calculations.
Subclasses must provide a getPriority() implementation instead.
This approach requires less code as long as priorities are expressable as simple
floats, and it avoids the dangers of defining potentially expensive priority
comparison functions.
It also should speed up priority_queue operations since they no longer have to
chase pointers when comparing registers. This is not measurable, though.
Preferably, we shouldn't use floats to guide code generation. The use of floats
here is derived from the use of floats for spill weights. Spill weights have a
dynamic range that doesn't lend itself easily to a fixpoint implementation.
When someone invents a stable spill weight representation, it can be reused for
allocation priorities.
llvm-svn: 121294
both forward and backward scheduling. Rename it to
ScoreboardHazardRecognizer (Scoreboard is one word). Remove integer
division from the scoreboard's critical path.
llvm-svn: 121274
This new register allocator is initially identical to RegAllocBasic, but it will
receive all of the tricks that RegAllocBasic won't get.
RegAllocGreedy will eventually replace linear scan.
llvm-svn: 121234
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
The StrongPHIElimination pass did not work, and nobody has worked on it for two
years.
A rewrite is underway, so I am leaving this shell pass instead of deleting it
completely.
llvm-svn: 120830
Scan the MachineFunction for DBG_VALUE instructions, and replace them with a
data structure similar to LiveIntervals. The live range of a DBG_VALUE is
determined by propagating it down the dominator tree until a new DBG_VALUE is
found. When a DBG_VALUE lives in a register, its live range is confined to the
live range of the register's value.
LiveDebugVariables runs before coalescing, so DBG_VALUEs are not artificially
extended when registers are joined.
The missing half will recreate DBG_VALUE instructions from the intervals when
register allocation is complete.
The pass is disabled by default. It can be enabled with the temporary command
line option -live-debug-variables.
llvm-svn: 120636
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
llvm-svn: 120501
in favor of the widespread llvm style. Capitalize variables and add
newlines for visual parsing. Rename variables for readability.
And other cleanup.
llvm-svn: 120490
This analysis is going to run immediately after LiveIntervals. It will stay
alive during register allocation and keep track of user variables mentioned in
DBG_VALUE instructions.
When the register allocator is moving values between registers and the stack, it
is very hard to keep track of DBG_VALUE instructions. We usually get it wrong.
This analysis maintains a data structure that makes it easy to update DBG_VALUE
instructions.
llvm-svn: 120385
so don't claim they are. They are allocated using DAG.getNode, so attempts
to access MemSDNode fields results in reading off the end of the allocated
memory. This fixes crashes with "llc -debug" due to debug code trying to
print MemSDNode fields for these barrier nodes (since the crashes are not
deterministic, use valgrind to see this). Add some nasty checking to try
to catch this kind of thing in the future.
llvm-svn: 119901
MCStreamer instead of just MCObjectStreamer. Address changes cannot
be as efficient as we have to use DW_LNE_set_addres, but at least
most of the logic is shared.
This will be used so that, with CodeGen still using EmitDwarfLocDirective,
llvm-gcc is able to produce debug_line sections without needing an
assembler that supports .loc.
llvm-svn: 119777
if the extension types were not the same. The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext). Reported
and diagnosed by Sebastien Deldon.
llvm-svn: 119728
and testing is easier. A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for
every address change anymore.
llvm-svn: 119613
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
SrcMgrDiagHandler, we can improve clang diagnostics for inline asm:
instead of reporting them on a source line of the original line,
we can report it on the correct line wherever the string literal came
from. For something like this:
void foo() {
asm("push %rax\n"
".code32\n");
}
we used to get this: (note that the line in t.c isn't helpful)
t.c:4:7: error: warning: ignoring directive for now
asm("push %rax\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
now we get:
t.c:5:8: error: warning: ignoring directive for now
".code32\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
Note that we're pointing to line 5 properly now.
llvm-svn: 119488
cookie argument to the SourceMgr diagnostic stuff. This cleanly separates
LLVMContext's inlineasm handler from the sourcemgr error handling
definition, increasing type safety and cleaning things up.
llvm-svn: 119486
Always spill the full representative register at any point where any subregister
is live.
This fixes PR8620 which caused the old logic to get confused and not spill
anything at all.
The fundamental problem here is that the coalescer is too aggressive about
physical register coalescing. It sometimes makes it impossible to allocate
registers without these emergency spills.
llvm-svn: 119375
The live range of a register defined by an early clobber starts at the use slot,
not the def slot.
Except when it is an early clobber tied to a use operand. Then it starts at the
def slot like a standard def.
llvm-svn: 119305
live ranges for the spill register are also defined at the use slot instead of
the normal def slot.
This fixes PR8612 for the inline spiller. A use was being allocated to the same
register as a spilled early clobber def.
This problem exists in all the spillers. A fix for the standard spiller is
forthcoming.
llvm-svn: 119182
catastrophic compilation time in the event of unreasonable LLVM
IR. Code quality is a separate issue--someone upstream needs to do a
better job of reducing to llvm.memcpy. If the situation can be reproduced with
any supported frontend, then it will be a separate bug.
llvm-svn: 118904
it makes no sense for allocation_order iterators to visit reserved regs.
The inline spiller depends on AliasAnalysis.
Manage the Query state to avoid uninitialized or stale results.
llvm-svn: 118800
This is the first small step towards using closed intervals for liveness instead
of the half-open intervals we're using now.
We want to be able to distinguish between a SlotIndex that represents a variable
being live-out of a basic block, and an index representing a variable live-in to
its successor.
That requires two separate indexes between blocks. One for live-outs and one for
live-ins.
With this change, getMBBEndIdx(MBB).getPrevSlot() becomes stable so it stays
greater than any instructions inserted at the end of MBB.
llvm-svn: 118747
Whenever splitting wants to insert a copy, it checks if the value can be
rematerialized cheaply instead.
Missing features:
- Delete instructions when all uses have been rematerialized.
- Truncate live ranges to the remaining uses after rematerialization.
llvm-svn: 118702
benchmarks hitting an assertion.
Adds LiveIntervalUnion::collectInterferingVRegs.
Fixes "late spilling" by checking for any unspillable live vregs among
all physReg aliases.
llvm-svn: 118701
handle cases in which a register is unavailable for spill code.
Adds LiveIntervalUnion::extract. While processing interferences on a
live virtual register, reuses the same Query object for each
physcial reg.
llvm-svn: 118423
to perform the copy, which may be of lots of memory [*]. It would be good if the
fall-back code generated something reasonable, i.e. did the copy in a loop, rather
than vast numbers of loads and stores. Add a note about this. Currently target
specific code seems to always kick in so this is more of a theoretical issue rather
than a practical one now that X86 has been fixed.
[*] It's amazing how often people pass mega-byte long arrays by copy...
llvm-svn: 118275
This way, InlineSpiller does the same amount of splitting as the standard
spiller. Splitting should really be guided by the register allocator, and
doesn't belong in the spiller at all.
llvm-svn: 118216
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
llvm-svn: 118167
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
BB#1: derived from LLVM BB %bb.nph28
Live Ins: %AL
Predecessors according to CFG: BB#0
TEST8rr %reg16384<kill>, %reg16384, %EFLAGS<imp-def>; GR8:%reg16384
JNE_4 <BB#2>, %EFLAGS<imp-use,kill>
JMP_4 <BB#2>
Successors according to CFG: BB#2 BB#2
These double CFG edges only ever occur in bugpoint-generated code, so there is
no need to attempt something clever.
llvm-svn: 117992
source, and let rewrite() clean it up.
This way, kill flags on the inserted copies are fixed as well during rewrite().
We can't just assume that all the copies we insert are going to be kills since
critical edges into loop headers sometimes require both source and dest to be
live out of a block.
llvm-svn: 117980
At least X86FloatingPoint requires correct kill flags after register allocation,
and targets using register scavenging benefit. Conservative kill flags are not
enough.
llvm-svn: 117960
at more than those which define CPSR. You can have this situation:
(1) subs ...
(2) sub r6, r5, r4
(3) movge ...
(4) cmp r6, 0
(5) movge ...
We cannot convert (2) to "subs" because (3) is using the CPSR set by
(1). There's an analogous situation here:
(1) sub r1, r2, r3
(2) sub r4, r5, r6
(3) cmp r4, ...
(5) movge ...
(6) cmp r1, ...
(7) movge ...
We cannot convert (1) to "subs" because of the intervening use of CPSR.
llvm-svn: 117950
looks like is happening:
Without the peephole optimizer:
(1) sub r6, r6, #32
orr r12, r12, lr, lsl r9
orr r2, r2, r3, lsl r10
(x) cmp r6, #0
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(2) sub r8, r8, #32
(a) movge r12, lr, lsr r6
(y) cmp r8, #0
LPC2_10:
ldr lr, [pc, r10]
(b) movge r2, r3, lsr r8
With the peephole optimizer:
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(1*) subs r6, r6, #32
(2*) subs r8, r8, #32
(a*) movge r12, lr, lsr r6
(b*) movge r2, r3, lsr r8
(1) is used by (x) for the conditional move at (a). (2) is used by (y) for the
conditional move at (b). After the peephole optimizer, these the flags resulting
from (1*) are ignored and only the flags from (2*) are considered for both
conditional moves.
llvm-svn: 117876
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
We don't want unused values forming their own equivalence classes, so we lump
them all together in one class, and then merge them with the class of the last
used value.
llvm-svn: 117670
in SSAUpdaterImpl.h
Verifying live intervals revealed that the old method was completely wrong, and
we need an iterative approach to calculating PHI placemant. Fortunately, we have
MachineDominators available, so we don't have to compute that over and over
like SSAUpdaterImpl.h must.
Live-out values are cached between calls to mapValue() and computed in a greedy
way, so most calls will be working with very small block sets.
Thanks to Bob for explaining how this should work.
llvm-svn: 117599
proper SSA updating.
This doesn't cause MachineDominators to be recomputed since we are already
requiring MachineLoopInfo which uses dominators as well.
llvm-svn: 117598
There are currently 100 references to COFF::IMAGE_SCN in 6 files
and 11 different functions. Section to attribute mapping really
needs to happen in one place to avoid problems like this.
llvm-svn: 117473
Critical edges going into a loop are not as bad as critical exits. We can handle
them by splitting the critical edge, or by having both inside and outside
registers live out of the predecessor.
llvm-svn: 117423
the remainder register.
Example:
bb0:
x = 1
bb1:
use(x)
...
x = 2
jump bb1
When x is isolated in bb1, the inner part breaks into two components, x1 and x2:
bb0:
x0 = 1
bb1:
x1 = x0
use(x1)
...
x2 = 2
x0 = x2
jump bb1
llvm-svn: 117408
do not double-count the duplicate instructions by counting once from the
beginning and again from the end. Keep track of where the duplicates from
the beginning ended and don't go past that point when counting duplicates
at the end. Radar 8589805.
This change causes one of the MC/ARM/simple-fp-encoding tests to produce
different (better!) code without the vmovne instruction being tested.
I changed the test to produce vmovne and vmoveq instructions but moving
between register files in the opposite direction. That's not quite the same
but predicated versions of those instructions weren't being tested before,
so at least the test coverage is not any worse, just different.
llvm-svn: 117333
instructions separately from the count of non-predicated instructions. The
instruction count is used in places to determine how many instructions to
copy, predicate, etc. and things get confused if that count includes the
extra cost for microcoded ops.
llvm-svn: 117332
2) live-outs.
Previously the post-RA schedulers completely ignore these dependencies since
returns, branches, etc. are all scheduling barriers. This patch model the
latencies between instructions being scheduled and the barriers. It also
handle calls by marking their register uses.
llvm-svn: 117193
framework. It's purpose is not to improve register allocation per se,
but to make it easier to develop powerful live range splitting. I call
it the basic allocator because it is as simple as a global allocator
can be but provides the building blocks for sophisticated register
allocation with live range splitting.
A minimal implementation is provided that trivially spills whenever it
runs out of registers. I'm checking in now to get high-level design
and style feedback. I've only done minimal testing. The next step is
implementing a "greedy" allocation algorithm that does some register
reassignment and makes better splitting decisions.
llvm-svn: 117174
When a block has exactly two uses and the register is both live-in and live-out,
don't isolate the block. We would be inserting two copies, so we haven't really
made any progress.
If the live-in and live-out values separate into disconnected components after
splitting, we would be making progress. We can't detect that for now.
llvm-svn: 117169
An exit block with a critical edge must only have predecessors in the loop, or
just before the loop. This guarantees that the inserted copies in the loop
predecessors dominate the exit block.
llvm-svn: 117144
- Initial register pressure in the loop should be all the live defs into the
loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.
llvm-svn: 116956
operand, also check if subregisters are killed.
Add <imp-def> operands for subregisters that remain alive after a super register
is killed.
I don't have a testcase for this that reproduces on trunk. <rdar://problem/8441758>
llvm-svn: 116940
Pull an unsigned out of the Contents union such that it has the same size as two
pointers and no padding.
Arrange members such that the Contents union and all pointers can be 8-byte
aligned without padding.
This speeds up code generation by 0.8% on a 64-bit host. 32-bit hosts should be
unaffected.
llvm-svn: 116857
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
llvm-svn: 116781
does normal initialization and normal chaining. Change the default
AliasAnalysis implementation to NoAlias.
Update StandardCompileOpts.h and friends to explicitly request
BasicAliasAnalysis.
Update tests to explicitly request -basicaa.
llvm-svn: 116720
All registers created during splitting or spilling are assigned to the same
stack slot as the parent register.
When splitting or rematting, we may not spill at all. In that case the stack
slot is still assigned, but it will be dead.
llvm-svn: 116546
splitting or spillling, and to help with rematerialization.
Use LiveRangeEdit in InlineSpiller and SplitKit. This will eventually make it
possible to share remat code between InlineSpiller and SplitKit.
llvm-svn: 116543
Before we would also split around a loop if any peripheral block had multiple
uses. This could cause repeated splitting when splitting a different live range
would insert uses into the periphery.
Now -spiller=inline passes the nightly test suite again.
llvm-svn: 116494
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
LocalRewriter.
This is a bit of a hack that adds an implicit use operand to model the
read-modify-write nature of a partial redef. Uses and defs are rewritten in
separate passes, and a single operand would never be processed twice.
<rdar://problem/8518892>
llvm-svn: 116210
functions: computeRemainder and rewrite.
When the remainder breaks up into multiple components, remember to rewrite those
uses as well.
llvm-svn: 116121
Such a check does not make any sense in presense of inlining and other compiler-dependent stuff.
This should fix bunch of warnings on mingw32.
llvm-svn: 116113
implicit. e.g.
%D6<def>, %D7<def> = VLD1q16 %R2<kill>, 0, ..., %Q3<imp-def>
%Q1<def> = VMULv8i16 %Q1<kill>, %Q3<kill>, ...
The real definition indices are 0,1.
llvm-svn: 116080
connected components. These components should be allocated different virtual
registers because there is no reason for them to be allocated together.
Add the ConnectedVNInfoEqClasses class to calculate the connected components,
and move values to new LiveIntervals.
Use it from SplitKit::rewrite by creating new virtual registers for the
components.
llvm-svn: 116006
This function is intended to be used when inserting a machine instruction that
trivially restricts the legal registers, like LEA requiring a GR32_NOSP
argument.
llvm-svn: 115875
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
llvm-svn: 115755
never kept after splitting.
Keeping the original interval made sense when the split region doesn't modify
the register, and the original is spilled. We can get the same effect by
detecting reloaded values when spilling around copies.
llvm-svn: 115695
Insert copy after defining instruction.
Fix LiveIntervalMap::extendTo to properly handle live segments starting before
the current basic block.
Make sure the open live range is extended to the inserted copy's use slot.
llvm-svn: 115665
having to do a double cast (uint64_t --> double --> float). This is based on the algorithm from compiler_rt's __floatundisf
for X86-64.
llvm-svn: 115634
// %a = ...
// %b = and i32 %a, 2
// %c = srl i32 %b, 1
// brcond i32 %c ...
//
// into
//
// %a = ...
// %b = and i32 %a, 2
// %c = setcc eq %b, 0
// brcond %c ...
Make sure it restores local variable N1, which corresponds to the condition operand if it fails to match.
This apparently breaks TCE but since that backend isn't in the tree I don't have a test for it.
llvm-svn: 115571
scheduling change in svn 115121. The CriticalAntiDepBreaker had bad
liveness information. It was calculating the KillIndices for one scheduling
region in a basic block, rescheduling that region so the KillIndices were
no longer valid, and then using those wrong KillIndices to make decisions
for the next scheduling region. I've not been able to reduce a small
testcase for this. Radar 8502534.
llvm-svn: 115400
LiveInterval::MergeValueNumberInto instead of trying to extend LiveRanges and
getting it wrong.
This fixed PR8249 where a valno with a multi-segment live range was defined by
an identity copy created by RemoveCopyByCommutingDef. Some of the live
segments disappeared.
llvm-svn: 115385
stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide
more nuanced estimates in the future.
llvm-svn: 115364
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
llvm-svn: 115243
edited during emission.
If the basic block ends in a switch that gets lowered to a jump table, any
phis at the default edge were getting updated wrong. The jump table data
structure keeps a pointer to the header blocks that wasn't getting updated
after the MBB is split.
This bug was exposed on 32-bit Linux when disabling critical edge splitting in
codegen prepare.
The fix is to uipdate stale MBB pointers whenever a block is split during
emission.
llvm-svn: 115191
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
llvm-svn: 114973
when the unconditional branch destination is the fallthrough block. The
canonicalization makes it easier to allow optimizations on DAGs to invert
conditional branches. The branch folding pass (and AnalyzeBranch) will clean up
the unnecessary unconditional branches later.
This is one of the patches leading up to disabling codegen prepare critical edge
splitting.
llvm-svn: 114630
Allocator instances can now be created by calling createPBQPRegisterAllocator.
Tidied up use of CoalescerPair as per Jakob's suggestions.
Made the new PBQPBuilder based construction process the default. The internal construction process
remains in-place and available via -pbqp-builder=false for now. It will be removed shortly if the new
process doesn't cause any regressions.
llvm-svn: 114626
creating it before and subtracting split ranges.
This way, the SSA update code in LiveIntervalMap can properly create and use new
phi values in dupli. Now it is possible to create split regions where a value
escapes along two different CFG edges, creating phi values outside the split
region.
This is a work in progress and probably quite broken.
llvm-svn: 114492
that complex patterns are matched after the entire pattern has
a structural match, therefore the NodeStack isn't in a useful
state when the actual call to the matcher happens.
llvm-svn: 114489
I think I've audited all uses, so it should be dependable for address spaces,
and the pointer+offset info should also be accurate when there.
llvm-svn: 114464
instead of calling lower_bound or upper_bound directly.
This cleans up the search logic a bit because {lower,upper}_bound compare
LR->start by default, and it is usually simpler to search LR->end.
Funnelling all searches through one function also makes it possible to replace
the search algorithm with something faster than binary search.
llvm-svn: 114448
into OptimizeCompareInstr.
This necessitates the passing of CmpValue around,
so widen the virtual functions to accomodate.
No functionality changes.
llvm-svn: 114428
"getFixedStack" on the MachinePointerInfo class. While
this isn't the problem I'm setting out to solve, it is the
right way to eliminate PseudoSourceValue, so lets go with it.
llvm-svn: 114406
MachinePointerInfo struct, no functionality change.
This also adds an assert to MachineMemOperand::MachineMemOperand
that verifies that the Value* is either null or is an IR pointer type.
llvm-svn: 114389
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.
This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.
llvm-svn: 114348
For now the allocator still uses the old (internal) construction mechanism by default. This will be phased out soon assuming
no issues with the builder system come up.
To invoke the new construction mechanism just pass '-regalloc=pbqp -pbqp-builder' to llc. To provide custom constraints a
Target just needs to extend PBQPBuilder and pass an instance of their derived builder to the RegAllocPBQP constructor.
llvm-svn: 114272
NO path to the destination containing side effects, not that SOME path contains no side effects.
In practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".
llvm-svn: 114268
1) Do forward copy propagation. This makes it easier to estimate the cost of the
instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
PHI nodes.
Critical edge splitting is not yet enabled by default.
llvm-svn: 114227
great deal because we don't have to worry about maintaining SSA form.
Unconditionally copy back to dupli when the register is live out of the split
range, even if the live-out value was defined outside the range. Skipping the
back-copy only makes sense when the live range is going to spill outside the
split range, and we don't know that it will. Besides, this was a hack to avoid
SSA update issues.
Clear up some confusion about the end point of a half-open LiveRange. Methinks
LiveRanges need to be closed so both start and end are included in the range.
The low bits of a SlotIndex are symbolic, so a half-open range doesn't really
make sense. This would be a pervasive change, though.
llvm-svn: 114043
iterator when an optimization took place. This allows us to do more insane
things with the code than just remove an instruction or two.
llvm-svn: 113640
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
LiveIntervals already adds <imp-def> operands for super-registers when a subreg
def defines the whole register. Thus, it is not necessary to do it again when
rewriting.
In fact, the super-register imp-defs caused miscompilations because the late
scheduler couldn't see that the super-register was read.
We still add super-reg <imp-use,kill> operands when rewriting virtuals to
physicals.
llvm-svn: 113299
Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:
int foo(int x, int y, int z) {
return x+y+z;
}
used to compile into:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
movl 4(%rsp), %esi
addl %edx, %esi
movl (%rsp), %edx
addl %esi, %edx
movl %edx, %eax
addq $12, %rsp
ret
Now we produce:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
addl 4(%rsp), %edx ## Folded load
addl (%rsp), %edx ## Folded load
movl %edx, %eax
addq $12, %rsp
ret
Fewer instructions and less register use = faster compiles.
llvm-svn: 113102
overload UserInInstr. Explicitly check Allocatable. The early exit in the
condition will mean the performance impact of the extra test should be
minimal.
llvm-svn: 113016
solve the root problem, but it corrects the bug in the code I added to
support legalizing in the case where the non-extended type is also legal.
llvm-svn: 112997
slot.
Teach it to also check for early clobbered aliases, and early clobber operands
following the current operand.
This fixes the miscompilation in PR8044 where EC registers eax and ecx were
being used for inputs.
llvm-svn: 112988
Original commit message:
Use the SSAUpdator to turn calls to eh.exception that are not in a
landing pad into uses of registers rather than loads from a stack
slot. Doesn't touch the 'orrible hack code - Bill needs to persuade
me harder :)
llvm-svn: 112952
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.
This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.
llvm-svn: 112861
landing pad into uses of registers rather than loads from a stack
slot. Doesn't touch the 'orrible hack code - Bill needs to persuade
me harder :)
llvm-svn: 112702
Reserved registers are unpredictable, and are treated as always live by machine
DCE.
Allocatable registers are never reserved, and can be used for virtual registers.
Unreserved, unallocatable registers can not be used for virtual registers, but
otherwise behave like a normal allocatable register. Most targets only have
the flag register in this set.
llvm-svn: 112649
1. Allocate them in the entry block of the function to enable function-wide
re-use. The instructions to create them should be re-materializable, so
there shouldn't be additional cost compared to creating them local
to the basic blocks where they are used.
2. Collect all of the frame index references for the function and sort them
by the local offset referenced. Iterate over the sorted list to
allocate the virtual base registers. This enables creation of base
registers optimized for positive-offset access of frame references.
(Note: This may be appropriate to later be a target hook to do the
sorting in a target appropriate manner. For now it's done here for
simplicity.)
llvm-svn: 112609
any more. I plan to reimplement alloca promotion using SSAUpdater later.
It looks like Bill's URoR logic really always needs domtree, so the pass
now always asks for domtree info.
llvm-svn: 112597
Eventually, we want to disable physreg coalescing completely, and let the
register allocator do its job using hints.
This option makes it possible to measure the impact of disabling physreg
coalescing.
llvm-svn: 112567
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
which was completely wrong. Specifically, it doesn't
make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
because it is for readonly data. templates (it turns out)
go to const_coal_nt. The real fix for rdar://8018335 was
to give ConstTextCoalSection a section kind of ReadOnly
instead of Text.
llvm-svn: 112496
instead of PromoteMemToReg. This allows it to stop using DF and DT,
eliminating a computation of DT and DF from clang -O3. Clang is now
down to 2 runs of DomFrontier.
llvm-svn: 112457
expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats. This
affects two places in the code: handling cross block values and handling
function return and arguments. Since vectors are already widened by
legalizetypes, this gives us much better code and unblocks x86-64 abi
and SPU abi work.
For example, this (which is a silly example of a cross-block value):
define <4 x float> @test2(<4 x float> %A) nounwind {
%B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1>
%C = fadd <2 x float> %B, %B
br label %BB
BB:
%D = fadd <2 x float> %C, %C
%E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
ret <4 x float> %E
}
Now compiles into:
_test2: ## @test2
## BB#0:
addps %xmm0, %xmm0
addps %xmm0, %xmm0
ret
previously it compiled into:
_test2: ## @test2
## BB#0:
addps %xmm0, %xmm0
pshufd $1, %xmm0, %xmm1
## kill: XMM0<def> XMM0<kill> XMM0<def>
insertps $0, %xmm0, %xmm0
insertps $16, %xmm1, %xmm0
addps %xmm0, %xmm0
ret
This implements rdar://8230384
llvm-svn: 112101
hierarchy with virtual methods and using llvm_unreachable to properly indicate
unreachable states which would otherwise leave variables uninitialized.
llvm-svn: 111803
It's similar to "linker_private_weak", but it's known that the address of the
object is not taken. For instance, functions that had an inline definition, but
the compiler decided not to inline it. Note, unlike linker_private and
linker_private_weak, linker_private_weak_def_auto may have only default
visibility. The symbols are removed by the linker from the final linked image
(executable or dynamic library).
llvm-svn: 111684
it involves specific floating-point types, legalize should expand an
extending load to a non-extending load followed by a separate extend operation.
For example, we currently expand SEXTLOAD to EXTLOAD+SIGN_EXTEND_INREG (and
assert that EXTLOAD should always be supported). Now we can expand that to
LOAD+SIGN_EXTEND. This is needed to allow vector SIGN_EXTEND and ZERO_EXTEND
to be used for NEON.
llvm-svn: 111586
The Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01
implements parts of C++0x based on the draft standard. An old version of
the draft had a bug that makes std::pair<T1*, T2*>(something, 0) fail to
compile. This is because the template<class U, class V> pair(U&& x, V&& y)
constructor is selected, even though it later fails to implicitly convert
U and V to frist_type and second_type.
This has been fixed in n3090, but it seems that Microsoft is not going to
update msvc.
llvm-svn: 111535
base registers were required. This will allow for slightly better packing
of the locals when alignment padding is necessary after callee saved registers.
llvm-svn: 111508
frame index reference to an object in the local block is seen, check if
it's near enough to any previously allocaated base register to re-use.
rdar://8277890
llvm-svn: 111443
We must complete the DFS, otherwise we might miss needed phi-defs, and
prematurely color live ranges with a non-dominating value.
This is not a big deal since we get to color more of the CFG and the next
mapValue call will be faster.
llvm-svn: 111397
LiveIntervalMap maps values from a parent LiveInterval to a child interval that
is a strict subset. It will create phi-def values as needed to preserve the
VNInfo SSA form in the child interval.
This leads to an algorithm very similar to the one in SSAUpdaterImpl.h, but with
enough differences that the code can't be reused:
- We don't need to manipulate PHI instructions.
- LiveIntervals have kills.
- We have MachineDominatorTree.
- We can use df_iterator.
llvm-svn: 111393
Nothing fancy, just ask the target if any currently available base reg
is in range for the instruction under consideration and use the first one
that is. Placeholder ARM implementation simply returns false for now.
ongoing saga of rdar://8277890
llvm-svn: 111374
the local block. Resolve references to those indices to a new base register.
For simplification and testing purposes, a new virtual base register is
allocated for each frame index being resolved. The result is truly horrible,
but correct, code that's good for exercising the new code paths.
Next up is adding thumb1 support, which should be very simple. Following that
will be adding base register re-use and implementing a reasonable ARM
heuristic for when a virtual base register should be generated at all.
llvm-svn: 111315
whether to allocate a virtual frame base register to resolve the frame
index reference in it. Implement a simple version for ARM to aid debugging.
In LocalStackSlotAllocation, scan the function for frame index references
to local frame indices and ask the target whether to allocate virtual
frame base registers for any it encounters. Purely infrastructural for
debug output. Next step is to actually allocate base registers, then add
intelligent re-use of them.
rdar://8277890
llvm-svn: 111262
mapping. Have the local block track its alignment requirement, and then
apply that when the block itself is allocated. Previously, offsets could
get adjusted in PEI to be different, relative to one another, than the
block allocation thought they would be, which defeats the point of doing
the allocation this way. Continuing rdar://8277890
llvm-svn: 111197
experimental pass that allocates locals relative to one another before
register allocation and then assigns them to actual stack slots as a block
later in PEI. This will eventually allow targets with limited index offset
range to allocate additional base registers (not just FP and SP) to
more efficiently reference locals, as well as handle situations where
locals cannot be referenced via SP or FP at all (dynamic stack realignment
together with variable sized objects, for example). It's currently
incomplete and almost certainly buggy. Work in progress.
Disabled by default and gated via the -enable-local-stack-alloc command
line option.
rdar://8277890
llvm-svn: 111059
The earliestStart argument is entirely specific to linear scan allocation, and
can be easily calculated by RegAllocLinearScan.
Replace std::vector with SmallVector.
llvm-svn: 111055
When a live range is contained a single block, we can split it around
instruction clusters. The current approach is very primitive, splitting before
and after the largest gap between uses.
llvm-svn: 111043
numbers match. The old check could accidentally leave holes in openli.
Also let useIntv add all ranges for the phi-def value inserted by
enterIntvAtEnd. This works as long at the value mapping is established in
enterIntvAtEnd.
llvm-svn: 110995
This can happen if the original interval has been broken into two disconnected
parts. Ideally, we should be able to detect when the graph is disconnected and
create separate intervals, but that code is not implemented yet.
Example:
Two basic blocks are both branching to a loop header. Our interval is defined in
both basic blocks, and live into the loop along both edges.
We decide to split the interval around the loop. The interval is split into an
inside part and an outside part. The outside part now has two disconnected
segments, one in each basic block.
If we later decide to split the outside interval into single blocks, we get one
interval per basic block and an empty dupli for the remainder.
llvm-svn: 110976
Before spilling a live range, we split it into a separate range for each basic
block where it is used. That way we only get one reload per basic block if the
new smaller ranges can allocate to a register.
This type of splitting is already present in the standard spiller.
llvm-svn: 110934
operands. We don't currently have a hook to provide "the largest super class of
A where all registers' getSubReg(subidx) is valid and in B".
llvm-svn: 110730
The live interval may be used for a spill slot as well, and that spill slot
could be shared by split registers. We cannot shrink it, even if we know the
current register won't need the spill slot in that range.
llvm-svn: 110721
When splitting a live range, the new registers have fewer uses and the
permissible register class may be less constrained. Recompute the register class
constraint from the uses of new registers created for a split. This may let them
be allocated from a larger set, possibly avoiding a spill.
llvm-svn: 110703
register at a time. This turns out to be slightly faster than iterating over
instructions, but more importantly, it allows us to compute spill weights for
new registers created after the spill weight pass has run.
Also compute the allocation hint at the same time as the spill weight. This
allows us to use the spill weight as a cost metric for copies, and choose the
most profitable hint if there is more than one possibility.
The new hints provide a very small (< 0.1%) but universal code size improvement.
llvm-svn: 110631
If we are emitting COPY instructions for the REG_SEQUENCE, make sure the kill
flag goes on the last COPY. Otherwise we may be using a killed register.
<rdar://problem/8287792>
llvm-svn: 110589
relatively expensive comparison analyzer on each instruction. Also rename the
comparison analyzer method to something more in line with what it actually does.
This pass is will eventually be folded into the Machine CSE pass.
llvm-svn: 110539
necessary.
Sometimes, live range splitting doesn't shrink the current interval, but simply
changes some instructions to use a new interval. That makes the original more
suitable for spilling. In this case, we don't need to duplicate the original.
llvm-svn: 110481
After heavy editing of a live interval, it is much easier to simply renumber the
live values instead of trying to keep track of the unused ones.
llvm-svn: 110463
When a physical register is in use, some alias of that register has a live
interval with a relevant live range. That is the sad state of intervals after
physreg coalescing of subregs, and it is good enough for correct register
allocation.
llvm-svn: 110452
This pass tries to remove comparison instructions when possible. For instance,
if you have this code:
sub r1, 1
cmp r1, 0
bz L1
and "sub" either sets the same flag as the "cmp" instruction or could be
converted to set the same flag, then we can eliminate the "cmp" instruction all
together. This is a important for ARM where the ALU instructions could set the
CPSR flag, but need a special suffix ('s') to do so.
llvm-svn: 110423
LiveVariables becomes horribly wrong while the coalescer is running, but the
analysis is not zapped until after the coalescer pass has run. This causes tons
of false reports when calling verify form the coalescer.
llvm-svn: 110402
We verify that the LiveInterval is live at uses and defs, and that all
instructions have a SlotIndex.
Stuff we don't check yet:
- Is the LiveInterval minimal?
- Do all defs correspond to instructions or phis?
- Do all defs dominate all their live ranges?
- Are all live ranges continually reachable from their def?
llvm-svn: 110386
be killed before being redefined.
These checks are usually disabled, and usually fail when enabled. We de facto
allow live registers to be redefined without a kill, the corresponding
assertions in RegScavenger were removed long ago.
llvm-svn: 110362
When the normalizeSpillWeights function was introduced, I forgot to remove this
normalization.
This change could affect register allocation. Hopefully for the better.
llvm-svn: 110119
multiple defs, like t2LDRSB_POST.
The first def could accidentally steal the physreg that the second, tied def was
required to be allocated to.
Now, the tied use-def is treated more like an early clobber, and the physreg is
reserved before allocating the other defs.
This would never be a problem when the tied def was the only def which is the
usual case.
This fixes MallocBench/gs for thumb2 -O0.
llvm-svn: 109715
protectors, to be near the stack protectors on the stack. Accomplish this by
tagging the stack object with a predicate that indicates that it would trigger
this. In the prolog-epilog inserter, assign these objects to the stack after the
stack protector but before the other objects.
llvm-svn: 109481
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
llvm-svn: 109300
to be of a different register class. For example, in Thumb1 if the live-in is
a high register, we want the vreg to be a low register. rdar://8224931
llvm-svn: 109291
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
llvm-svn: 109279
Make MDNode::destroy private.
Fix the one thing that used MDNode::destroy, outside of MDNode itself.
One should never delete or destroy an MDNode explicitly. MDNodes
implicitly go away when there are no references to them (implementation
details aside).
llvm-svn: 109028
This is a work in progress. So far we have some basic loop analysis to help
determine where it is useful to split a live range around a loop.
The actual loop splitting code from Splitter.cpp is also going to move in here.
llvm-svn: 108842
loop, for the reasons in the comments. This is a
major win on 253.perlbmk on ARM Darwin. I expect it
to be a good heuristic in general, but it's possible
some things will regress; I'll be watching.
7940152.
llvm-svn: 108792
- Unfortunate, but necessary for now to handle subtarget instruction matching. Eventually we should factor out the lower level target machine information so we don't need to do this.
llvm-svn: 108664
I am assured by people more knowledgeable than me that there are no rounding issues in eliminating this.
This fixed <rdar://problem/8197504>.
llvm-svn: 108639
Still very much under development. Comments and fixes will be forthcoming.
(This commit includes some small tweaks to LiveIntervals & LoopInfo to support the splitter)
llvm-svn: 108615
any command line paramater changed the register allocation produced by
PBQP.
Turns out variety is not the spice of life.
Fixed some comparators, added others. All good now.
llvm-svn: 108613
void foo() { __builtin_unreachable(); }
It will output the following on Darwin X86:
_func1:
Leh_func_begin0:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.
llvm-svn: 108568
since it doesn't work for front-ends which don't emit column information
(which includes llvm-gcc in its present configuration), and doesn't
work for clang for K&R style variables where the variables are declared
in a different order from the parameter list.
Instead, make a separate pass through the instructions to collect the
llvm.dbg.declare instructions in order. This ensures that the debug
information for variables is emitted in this order.
llvm-svn: 108538
occasions, caused code to be generated in a different order.
All cases I've seen involved float softening in the type
legalizer, and this could be perhaps be fixed there, but
it's better not to generate things differently in the first
place. 7797940 (6/29/2010..7/15/2010).
llvm-svn: 108484
the function. We'll just turn it into a "trap" instruction instead.
The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:
$ cat t.ll
define void @foo() {
entry:
unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
Leh_func_begin0:
## BB#0: ## %entry
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
...
The unwind tables then have bad data in them causing all sorts of problems.
Fixes <rdar://problem/8096481>.
llvm-svn: 108473
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
llvm-svn: 108465
to keep "Text" in sync with the "pure instructions" section attribute.
Lack of this attribute was preventing the assembler from emitting
multibyte noops instructions for templates (and inlines, and other
coalesced stuff) and was causing the assembler to mismatch .o files.
This fixes rdar://8018335
llvm-svn: 108461
independent of the order that isel happens to visit the dbg_declare
intrinsics. This fixes a bug in which the formal arguments were
being printed in reverse order, now that fast isel is going bottom up.
llvm-svn: 108369
LiveInterval::overlapsFrom dereferences end() if it is called on an empty
interval.
It would be reasonable to just return false - an empty interval doesn't overlap
anything, but I want to know who is doing it first.
llvm-svn: 108264
they already have one.
This fixes the himenobmtxpa miscompilation on ARM.
The PostRA scheduler got confused by the double memoperand and hoisted a stack
slot load above a store to the same slot.
llvm-svn: 108219
AggressiveAntiDepBreaker should not be using getPhysicalRegisterRegClass. An
instruction might be using a register that can only be replaced with one from
a subclass of getPhysicalRegisterRegClass.
With this patch we use getMinimalPhysRegClass. This is correct, but
conservative. We should check the uses of the register and select the
largest register class that can be used in all of them.
llvm-svn: 108122
physical register can be allocated in the class of the virtual are sufficient.
I think that the test for virtual registers is more strict than it needs to be,
it should be possible to coalesce two virtual registers the class of one
is a subclass of the other.
llvm-svn: 108118
getMinimalPhysRegClass. It was used to produce spills, and it is better to
use the most specific class if possible.
Update getLoadStoreRegOpcode to handle GR32_AD.
llvm-svn: 108115
Targets must now implement TargetInstrInfo::copyPhysReg instead. There is no
longer a default implementation forwarding to copyRegToReg.
llvm-svn: 108095
The first one was used just to call isSafeToMoveRegClassDefs. In
general, using a more specific reg class is better, in practice only
x86 implements that method and the results are always the same.
The second one is in FindFreeRegister and is used to check if a register
is in a register class, a much more direct call to contains is better as
it should cover more cases and is faster.
llvm-svn: 108093
assert()s, switching to void-casts. Removed an unneeded Compiler.h include as
a result. There are two other uses in LLVM, but they're not due to assert()s,
so I've left them alone.
llvm-svn: 108088
correct alignment information, which simplifies ExpandRes_VAARG a bit.
The patch introduces a new alignment information to TargetLoweringInfo. This is
needed since the two natural candidates cannot be used:
* The 's' in target data: If this is set to the minimal alignment of any
argument, getCallFrameTypeAlignment would return 4 for doubles on ARM for
example.
* The getTransientStackAlignment method. It is possible for an architecture to
have argument less aligned than what we maintain the stack pointer.
llvm-svn: 108072
The remaining copyRegToReg calls actually check the return value (shock!), so we
cannot trivially replace them with COPY instructions.
llvm-svn: 108069
if a block is split (by a custom inserter), the insert point may be in a
different block than it was originally. This fixes 32-bit llvm-gcc
bootstrap builds, and I haven't been able to reproduce it otherwise.
llvm-svn: 108060
ScheduleDAGEmit, TwoAddressLowering, and PHIElimination.
This switches the bulk of register copies to using COPY, but many less used
copyRegToReg calls remain.
llvm-svn: 108050
- Check getBytesToPopOnReturn().
- Eschew ST0 and ST1 for return values.
- Fix the PIC base register initialization so that it doesn't ever
fail to end up the top of the entry block.
llvm-svn: 108039
inserted in a MBB, and return an already inserted MI.
This target API change is necessary to allow foldMemoryOperand to call
storeToStackSlot and loadFromStackSlot when folding a COPY to a stack slot
reference in a target independent way.
The foldMemoryOperandImpl hook is going to change in the same way, but I'll wait
until COPY folding is actually implemented. Most targets only fold copies and
won't need to specialize this hook at all.
llvm-svn: 107991
U utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U test/CodeGen/X86/fast-isel.ll
U test/CodeGen/X86/fast-isel-loads.ll
U include/llvm/Target/TargetLowering.h
U include/llvm/Support/PassNameParser.h
U include/llvm/CodeGen/FunctionLoweringInfo.h
U include/llvm/CodeGen/CallingConvLower.h
U include/llvm/CodeGen/FastISel.h
U include/llvm/CodeGen/SelectionDAGISel.h
U lib/CodeGen/LLVMTargetMachine.cpp
U lib/CodeGen/CallingConvLower.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U lib/CodeGen/SelectionDAG/FastISel.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U lib/CodeGen/SelectionDAG/TargetLowering.cpp
U lib/Target/XCore/XCoreISelLowering.cpp
U lib/Target/XCore/XCoreISelLowering.h
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86ISelLowering.h
llvm-svn: 107987
disabled and then never turned back on again. Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.
llvm-svn: 107941
the simplification of frame index register scavenging to not have to check
for available registers directly and instead just let scavengeRegister()
handle it.
llvm-svn: 107880
EXTRACT_SUBREG no longer appears as a machine instruction. Use COPY instead.
Add isCopy() checks in many places using isMoveInstr() and isExtractSubreg().
The isMoveInstr hook will be removed later.
llvm-svn: 107879
This target hook is intended to replace copyRegToReg entirely, but for now it
calls copyRegToReg.
Any remaining calls to copyRegToReg wil be replaced by COPY instructions.
llvm-svn: 107854
(if there are any) and use the one which remains available for the longest
rather than just using the first one. This should help enable better re-use
of the loaded frame index values. rdar://7318760
llvm-svn: 107847
around everywhere, and also give it an InsertPt member, to enable isel
to operate at an arbitrary position within a block, rather than just
appending to a block.
llvm-svn: 107791
INSERT_SUBREG will now only appear in SSA machine instructions.
Fix the handling of partial redefs in ProcessImplicitDefs. This is now relevant
since partial redef COPY instructions appear.
llvm-svn: 107726
It is OK for an alias live range to overlap if there is a copy to or from the
physical register. CoalescerPair can work out if the copy is coalescable
independently of the alias.
This means that we can join with the actual destination interval instead of
using the getOrigDstReg() hack. It is no longer necessary to merge clobber
ranges into subregisters.
llvm-svn: 107695
This way *only* debug sections can be discarded, but not the opposite. Seems like the copy-and-pasto from ELF code, since there it contains the reverse flag ('alloc').
llvm-svn: 107658
The COPY instruction is intended to replace the target specific copy
instructions for virtual registers as well as the EXTRACT_SUBREG and
INSERT_SUBREG instructions in MachineFunctions. It won't we used in a selection
DAG.
COPY is lowered to native register copies by LowerSubregs.
llvm-svn: 107529
new basic blocks, and if used as a function argument, that can cause call frame
setup / destroy pairs to be split across a basic block boundary. That prevents
us from doing a simple assertion to check that the pairs match and alloc/
dealloc the same amount of space. Modify the assertion to only check the
amount allocated when there are matching pairs in the same basic block.
rdar://8022442
llvm-svn: 107517
- X86 unfolding should check if the instructions being unfolded has memoperands.
If there is no memoperands, then it must assume conservative alignment. If this
would introduce an expensive sse unaligned load / store, then unfoldMemoryOperand
etc. should not unfold the instruction.
llvm-svn: 107509
PrologEpilog code, and use it to determine whether
the asm forces stack alignment or not. gcc consistently
does not do this for GCC-style asms; Apple gcc inconsistently
sometimes does it for asm blocks. There is no
convenient place to put a bit in either the SDNode or
the MachineInstr form, so I've added an extra operand
to each; unlovely, but it does allow for expansion for
more bits, should we need it. PR 5125. Some
existing testcases are affected.
The operand lists of the SDNode and MachineInstr forms
are indexed with awesome mnemonics, like "2"; I may
fix this someday, but not now. I'm not making it any
worse. If anyone is inspired I think you can find all
the right places from this patch.
llvm-svn: 107506
This allows us to recognize the common case where all uses could be
rematerialized, and no stack slot allocation is necessary.
If some values could be fully rematerialized, remove them from the live range
before allocating a stack slot for the rest.
llvm-svn: 107492
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.
For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
Currently only supported on Darwin platforms.
llvm-svn: 107433
LocalRewriter::runOnMachineFunction uses this information to mark dead spill
slots.
This means that InlineSpiller now also works for functions that spill.
llvm-svn: 107302
InlineSpiller inserts loads and spills immediately instead of deferring to
VirtRegMap. This is possible now because SlotIndexes allows instructions to be
inserted and renumbered.
This is work in progress, and is mostly a copy of TrivialSpiller so far. It
works very well for functions that don't require spilling.
llvm-svn: 107227
metadata types which should be marked as "weak", but which the linker will
remove upon final linkage. For example, the "objc_msgSend_fixup_alloc" symbol is
defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
llvm-svn: 107205
A partial redefine needs to be treated like a tied operand, and the register
must be reloaded while processing use operands.
This fixes a bug where partially redefined registers were processed as normal
defs with a reload added. The reload could clobber another use operand if it was
a kill that allowed register reuse.
llvm-svn: 107193
The LowerSubregs pass needs to preserve implicit def operands attached to
EXTRACT_SUBREG instructions when it replaces those instructions with copies.
llvm-svn: 107189
of getPhysicalRegisterRegClass with it.
If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.
llvm-svn: 107140
There are 2 changes relative to the previous version of the patch:
1) For the "simple" if-conversion case, there's no need to worry about
RemoveExtraEdges not handling an unanalyzable branch. Predicated terminators
are ignored in this context, so RemoveExtraEdges does the right thing.
This might break someday if we ever treat indirect branches (BRIND) as
predicable, but for now, I just removed this part of the patch, because
in the case where we do not add an unconditional branch, we rely on keeping
the fall-through edge to CvtBBI (which is empty after this transformation).
The change relative to the previous patch is:
@@ -1036,10 +1036,6 @@
IterIfcvt = false;
}
- // RemoveExtraEdges won't work if the block has an unanalyzable branch,
- // which is typically the case for IfConvertSimple, so explicitly remove
- // CvtBBI as a successor.
- BBI.BB->removeSuccessor(CvtBBI->BB);
RemoveExtraEdges(BBI);
// Update block info. BB can be iteratively if-converted.
2) My patch exposed a bug in the code for merging the tail of a "diamond",
which had previously never been exercised. The code was simply checking that
the tail had a single predecessor, but there was a case in
MultiSource/Benchmarks/VersaBench/dbms where that single predecessor was
neither edge of the diamond. I added the following change to check for
that:
@@ -1276,7 +1276,18 @@
// tail, add a unconditional branch to it.
if (TailBB) {
BBInfo TailBBI = BBAnalysis[TailBB->getNumber()];
- if (TailBB->pred_size() == 1 && !TailBBI.HasFallThrough) {
+ bool CanMergeTail = !TailBBI.HasFallThrough;
+ // There may still be a fall-through edge from BBI1 or BBI2 to TailBB;
+ // check if there are any other predecessors besides those.
+ unsigned NumPreds = TailBB->pred_size();
+ if (NumPreds > 1)
+ CanMergeTail = false;
+ else if (NumPreds == 1 && CanMergeTail) {
+ MachineBasicBlock::pred_iterator PI = TailBB->pred_begin();
+ if (*PI != BBI1->BB && *PI != BBI2->BB)
+ CanMergeTail = false;
+ }
+ if (CanMergeTail) {
MergeBlocks(BBI, TailBBI);
TailBBI.IsDone = true;
} else {
With these fixes, I was able to run all the SingleSource and MultiSource
tests successfully.
llvm-svn: 107110
have to be registers, per gcc documentation. This affects
the logic for determining what "g" should lower to. PR 7393.
A couple of existing testcases are affected.
llvm-svn: 107079
When an instruction has tied operands and physreg defines, we must take extra
care that the tied operands conflict with neither physreg defs nor uses.
The special treatment is given to inline asm and instructions with tied operands
/ early clobbers and physreg defines.
This fixes PR7509.
llvm-svn: 107043
if-conversion. The RemoveExtraEdges function doesn't work for blocks that
end with unanalyzable branches, so in those cases, the "extra" edges must
be explicitly removed. The CopyAndPredicateBlock and MergeBlocks methods
can also avoid copying successor edges due to branches that have already
been removed. The latter case is especially helpful when MergeBlocks is
called for handling "diamond" if-conversions, where otherwise you can end
up with some weird intermediate states in the CFG. Unfortunately I've
been unable to find cases where this cleanup actually makes a significant
difference in the code. There is one test where we manage to remove an
empty block at the end of a function. Radar 6911268.
llvm-svn: 106939
The VNInfo.kills vector was almost unused except for all the code keeping it
updated. The few places using it were easily rewritten to check for interval
ends instead.
The two new methods LiveInterval::killedAt and killedInRange are replacements.
This brings us down to 3 independent data structures tracking kills.
llvm-svn: 106905
for an "i" constraint should get lowered; PR 6309. While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.
llvm-svn: 106893
are dead, not just the def of this register. I.e., a register could be dead, but
it's subreg isn't.
Testcase to follow with a subsequent patch.
llvm-svn: 106878
and CallInst for getting hold
of the intrinsic's arguments
simplify along the way (at least for me this is much more legible now)
Bill, Baldrick or Anton, please review\!
llvm-svn: 106838
This fixes PR7479 and PR7485. The test cases from those PRs are big, so not
included. However, PR7485 comes from self hosting on FreeBSD, so we will surely
hear about any regression.
llvm-svn: 106811
which don't have a catch-all associated with them not just clean-ups. This fixes
the SingleSource/Benchmarks/Shootout-C++/except.cpp testcase that broke because
of my change r105902.
llvm-svn: 106772
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
This second attempt fixes some crashes that only occurred Linux.
llvm-svn: 106769
[L]oad, [u]se, [d]ef, or [S]tore slots.
This makes it easier to see if two indices refer to the same instruction,
avoiding mental mod 4 calculations.
llvm-svn: 106766
In this case it is essential that the kill is real because the spiller will
decide to omit a spill if it thinks there is a later kill.
llvm-svn: 106751
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.
I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.
llvm-svn: 106748
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
llvm-svn: 106701
atomic intrinsics, either because the use locking instructions for the
atomics, or because they perform the locking directly. Add support in the
DAG combiner to fold away the fences.
llvm-svn: 106630
instructions.
This does not affect codegen much because SUBREG_TO_REG is only used by X86 and
X86 does not use the register scavenger, but it prevents verifier errors.
llvm-svn: 106583
Measurements show that it does not speed up coalescing, so there is no reason
the keep the added complexity around.
Also clean out some unused methods and static functions.
llvm-svn: 106548
opportunities. For example, this lets it emit this:
movq (%rax), %rcx
addq %rdx, %rcx
instead of this:
movq %rdx, %rcx
addq (%rax), %rcx
in the case where %rdx has subsequent uses. It's the same number
of instructions, and usually the same encoding size on x86, but
it appears faster, and in general, it may allow better scheduling
for the load.
llvm-svn: 106493
Split the code for materializing a value out of
SelectionDAGBuilder::getValue into a helper function, so that it can
be used in other ways. Add a new getNonRegisterValue function which
uses it, for use in code which doesn't want a CopyFromReg even
when FuncMap.ValueMap already has an entry for it.
llvm-svn: 106422
- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344
instructions, but it doesn't really understand live ranges, so the first
INSERT_SUBREG uses an implicitly defined register.
Fix it in LiveVariableAnalysis by adding the <undef> flag.
llvm-svn: 106333
entries used by llvm-gcc. *_[U]MIN and such can be added later if needed.
This enables the front ends to simplify handling of the atomic intrinsics by
removing the target-specific decision about which targets can handle the
intrinsics.
llvm-svn: 106321
so when IfConverter::CopyAndPredicateBlock checks to see if it should ignore
an instruction because it is a branch, it should not check if the branch is
predicated.
This case (when IgnoreBr is true) is only relevant from IfConvertTriangle,
where new branches are inserted after the block has been copied and predicated.
If the original branch is not removed, we end up with multiple conditional
branches (possibly conflicting) at the end of the block. Aside from any
immediate errors resulting from that, this confuses the AnalyzeBranch functions
so that the branches are not analyzable. That in turn causes the IfConverter to
think that the "Simple" pattern can be applied, and things go downhill fast
because the "Simple" pattern does _not_ apply if the block can fall through.
This is pretty fragile. If there are other degenerate cases where AnalyzeBranch
fails, but where the block may still fall through, the IfConverter should not
perform its "Simple" if-conversion. But, I don't know how to do that with the
current AnalyzeBranch interface, so for now, the best thing seems to be to
avoid creating branches that AnalyzeBranch cannot handle.
Evan, please review!
llvm-svn: 106291
switch from this:
if (TimePassesIsEnabled) {
NamedRegionTimer T(Name, GroupName);
do_something();
} else {
do_something(); // duplicate the code, this time without a timer!
}
to this:
{
NamedRegionTimer T(Name, GroupName, TimePassesIsEnabled);
do_something();
}
llvm-svn: 106285
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
llvm-svn: 106243
for the moment. The implementation of the libcall will follow.
Currently, the llvm-gcc knows when the intrinsics can be correctly handled by
the back end and only generates them in those cases, issuing libcalls directly
otherwise. That's too much coupling. The intrinsics should always be
generated and the back end decide how to handle them, be it with a libcall,
inline code, or whatever. This patch is a step in that direction.
rdar://8097623
llvm-svn: 106227
LiveVariableAnalysis was a bit picky about a register only being redefined once,
but that really isn't necessary.
Here is an example of chained INSERT_SUBREGs that we can handle now:
68 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1028<kill>, 14
register: %reg1040 +[70,134:0)
76 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1029<kill>, 13
register: %reg1040 replace range with [70,78:1) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,134:0) 0@78-(134) 1@70-(78)
84 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1030<kill>, 12
register: %reg1040 replace range with [78,86:2) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,134:0) 0@86-(134) 1@70-(78) 2@78-(86)
92 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1031<kill>, 11
register: %reg1040 replace range with [86,94:3) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,94:3)[94,134:0) 0@94-(134) 1@70-(78) 2@78-(86) 3@86-(94)
rdar://problem/8096390
llvm-svn: 106152
will conflict with another live range. The place which creates this scenerio is
the code in X86 that lowers a select instruction by splitting the MBBs. This
eliminates the need to check from the bottom up in an MBB for live pregs.
llvm-svn: 106066
SimpleRegisterCoalescing::JoinIntervals() uses CoalescerPair to determine if a
copy is coalescable, and in very rare cases it can return true where LHS is not
live - the coalescable copy can come from an alias of the physreg in LHS.
llvm-svn: 106021
combined to an insert_subreg, i.e., where the destination register is larger
than the source. We need to check that the subregs can be composed for that
case in a symmetrical way to the case when the destination is smaller.
llvm-svn: 106004
Early clobbers defining a virtual register were first alocated to a physreg and
then processed as a physreg EC, spilling the virtreg.
This fixes PR7382.
llvm-svn: 105998
Given a copy instruction, CoalescerPair can determine which registers to
coalesce in order to eliminate the copy. It deals with all the subreg fun to
determine a tuple (DstReg, SrcReg, SubIdx) such that:
- SrcReg is a virtual register that will disappear after coalescing.
- DstReg is a virtual or physical register whose live range will be extended.
- SubIdx is 0 when DstReg is a physical register.
- SrcReg can be joined with DstReg:SubIdx.
CoalescerPair::isCoalescable() determines if another copy instruction is
compatible with the same tuple. This fixes some NEON miscompilations where
shuffles are getting coalesced as if they were copies.
The CoalescerPair class will replace a lot of the spaghetti logic in JoinCopy
later.
llvm-svn: 105997
replacing the overly conservative checks that I had introduced recently to
deal with correctness issues. This makes a pretty noticable difference
in our testcases where reg_sequences are used. I've updated one test to
check that we no longer emit the unnecessary subreg moves.
llvm-svn: 105991