the alias of an InstAlias instead of the thing being aliased. Because we need to
know the features that are valid for an InstAlias.
This is part of a work-in-progress.
llvm-svn: 127986
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
Proof-of-concept code that code-gens a module to an in-memory MachO object.
This will be hooked up to a run-time dynamic linker library (see: llvm-rtdyld
for similarly conceptual work for that part) which will take the compiled
object and link it together with the rest of the system, providing back to the
JIT a table of available symbols which will be used to respond to the
getPointerTo*() queries.
llvm-svn: 127916
The llvm.dbg.value intrinsic refers to SSA values, not virtual registers, so we
should be able to extend the range of a value by tracking that value through
register copies. This greatly improves the debug value tracking for function
arguments that for some reason are copied to a second virtual register at the
end of the entry block.
We only extend the debug value range where its register is killed. All original
llvm.dbg.value locations are still respected.
Copies from physical registers are ignored. That should not be a problem since
the entry block already adds DBG_VALUE instructions for the virtual registers
holding the function arguments.
llvm-svn: 127912
Stack slot real estate is virtually free compared to registers, so it is
advantageous to spill earlier even though the same value is now kept in both a
register and a stack slot.
Also eliminate redundant spills by extending the stack slot live range
underneath reloaded registers.
This can trigger a dead code elimination, removing copies and even reloads that
were only feeding spills.
llvm-svn: 127868
I have convinced myself that it can only happen when a phi value dies. When it
happens, allocate new virtual registers for the components.
llvm-svn: 127827
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
After live range splitting, an original value may be available in multiple
registers. Tracing back through the registers containing the same value, find
the best place to insert a spill, determine if the value has already been
spilled, or discover a reaching def that may be rematerialized.
This is only the analysis part. The information is not used for anything yet.
llvm-svn: 127698
v2 = bitcast v1
...
v3 = bitcast v2
...
= v3
=>
v2 = bitcast v1
...
= v1
if v1 and v3 are of in the same register class.
bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.
rdar://9104514
llvm-svn: 127668
and then go kablooie. The problem was that it was tracking the PHI nodes anew
each time into this function. But it didn't need to. And because the recursion
didn't know that a PHINode was visited before, it would go ahead and call
itself.
There is a testcase, but unfortunately it's too big to add. This problem will go
away with the EH rewrite.
<rdar://problem/8856298>
llvm-svn: 127640
Remove the unused reserved_ bit vector, no functional change intended.
This doesn't break 'svn blame', this file really is all my fault.
llvm-svn: 127607
llvm-gcc-i386-linux-selfhost and llvm-x86_64-linux-checks buildbots.
The original log entry:
Remove optimization emitting a reference insted of label difference, since
it can create more relocations. Removed isBaseAddressKnownZero method,
because it is no longer used.
llvm-svn: 127540
Live range splitting can create a number of small live ranges containing only a
single real use. Spill these small live ranges along with the large range they
are connected to with copies. This enables memory operand folding and maximizes
the spill to fill distance.
Work in progress with known bugs.
llvm-svn: 127529
There are too many compatibility problems with using mixed types in
std::upper_bound, and I don't want to spend 110 lines of boilerplate setting up
a call to a 10-line function. Binary search is not /that/ hard to implement
correctly.
I tried terminating the binary search with a linear search, but that actually
made the algorithm slower against my expectation. Most live intervals have less
than 4 segments. The early test against endIndex() does pay, and this version is
25% faster than plain std::upper_bound().
llvm-svn: 127522
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
llvm-svn: 127497
The existing CompEnd predicate does not define a strict weak order as required
by the C++03 standard; therefore, its use as a predicate to std::upper_bound
is invalid. For a discussion of this issue, see
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#270
This patch replaces the asymmetrical comparison with an iterator adaptor that
achieves the same effect while being strictly standard-conforming by ensuring
an apples-to-apples comparison.
llvm-svn: 127462
flexible.
If it returns a register class that's different from the input, then that's the
register class used for cross-register class copies.
If it returns a register class that's the same as the input, then no cross-
register class copies are needed (normal copies would do).
If it returns null, then it's not at all possible to copy registers of the
specified register class.
llvm-svn: 127368
The damage done by physreg coalescing only depends on the number of instructions
the extended physreg live range covers. This fixes PR9438.
The heuristic is still luck-based, and physreg coalescing really should be
disabled completely. We need a register allocator with better hinting support
before that is possible.
Convert a test to FileCheck and force spilling by inserting an extra call. The
previous spilling behavior was dependent on misguided physreg coalescing
decisions.
llvm-svn: 127351
LiveRangeEdit::eliminateDeadDefs() will eventually be used by coalescing,
splitting, and spilling for dead code elimination. It can delete chains of dead
instructions as long as there are no dependency loops.
llvm-svn: 127287
with this before since none of the register tracking or nightly tests
had unschedulable nodes.
This should probably be refixed with a special default Node that just
returns some "don't touch me" values.
Fixes PR9427
llvm-svn: 127263
The coalescer can in very rare cases leave too large live intervals around after
rematerializing cheap-as-a-move instructions.
Linear scan doesn't really care, but live range splitting gets very confused
when a live range is killed by a ghost instruction.
I will fix this properly in the coalescer after 2.9 branches.
llvm-svn: 127096
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
llvm-svn: 127067
This simplifies the code and makes it faster too.
The interference patterns are saved for each candidate register. It will be
reused for actually executing the split. Work in progress.
llvm-svn: 127054
Initially, slot indexes are quad-spaced. There is room for inserting up to 3
new instructions between the original instructions.
When we run out of indexes between two instructions, renumber locally using
double-spaced indexes. The original quad-spacing means that we catch up quickly,
and we only have to renumber a handful of instructions to get a monotonic
sequence. This is much faster than renumbering the whole function as we did
before.
llvm-svn: 127023
You can't really predict how many indexes will be needed from the number of
defs, so let's keep it simple.
Also remove an extra empty index that was inserted after each basic block. It
was intended for live-out ranges, but it was never used that way.
llvm-svn: 127014
type after type legalization has completed. Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).
llvm-svn: 127000
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
llvm-svn: 126981
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.
Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.
Testcase ObjC/exceptions-4.m in r126968.
llvm-svn: 126969
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
llvm-svn: 126926
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
llvm-svn: 126925
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.
llvm-svn: 126924
We need to wait until we meet a PHIDef in its defining block before resurrecting
PHIKills in the predecessors.
This should unbreak the llvm-gcc-build-x86_64-darwin10-x-mingw32-x-armeabi bot.
llvm-svn: 126905
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG. Unfortunately
this blows up exponentially in some cases. The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes. Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.
llvm-svn: 126899
Values that map to a single new value in a new interval after splitting don't
need new PHIDefs, and if the parent value was never rematerialized the live
range will be the same.
llvm-svn: 126894
Extract the updateSSA() method from the too long extendRange().
LiveOutCache can be shared among all the new intervals since there is at most
one of the new ranges live out from each basic block.
llvm-svn: 126818
This method could probably be used by LiveIntervalAnalysis::shrinkToUses, and
now it can use extendIntervalEndTo() which coalesces ranges.
llvm-svn: 126803
The value map is currently not used, all values are 'complex mapped' and
LiveIntervalMap::mapValue is used to dig them out.
This is the first step in a series changes leading to the removal of
LiveIntervalMap. Its data structures can be shared among all the live intervals
created by a split, so it is wasteful to create a copy for each.
llvm-svn: 126800
This effectively disables the 'turbo' functionality of the greedy register
allocator where all new live ranges created by splitting would be reconsidered
as if they were originals.
There are two reasons for doing this, 1. It guarantees that the algorithm
terminates. Early versions were prone to infinite looping in certain corner
cases. 2. It is a 2x speedup. We can skip a lot of unnecessary interference
checks that won't lead to good splitting anyway.
The problem is that region splitting only gets one shot, so it should probably
be changed to target multiple physical registers at once.
Local live range splitting is still 'turbo' enabled. It only accounts for a
small fraction of compile time, so it is probably not necessary to do anything
about that.
llvm-svn: 126781
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
llvm-svn: 126557
is possible to do better if the high bit is set in either KnownZero/KnownOne, but
in practice NumSignBits is always 1 when we are zero extending because nothing
is known about that register.
llvm-svn: 126465
New live ranges are assigned in long -> short order, but live ranges that have
been evicted at least once are deferred and assigned in short -> long order.
Also disable splitting and spilling for live ranges seen for the first time.
The intention is to create a realistic interference pattern from the heavy live
ranges before starting splitting and spilling around it.
llvm-svn: 126451
Limit the folding of any_ext and sext into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.
Similar to commit 126080 (for enabling zext).
llvm-svn: 126424
The problem was codegen guessing the wrong values and printing
.section .eh_frame,"aMS",@progbits,4
It is not clear at all if Codegen should try to guess, MC is the
one that should know the default flags.
llvm-svn: 126421
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.
llvm-svn: 126380
When a large live range is evicted, it will usually be split when it comes
around again. By deferring evicted live ranges, the splitting happens at a time
when the interference pattern is more realistic. This prevents repeated
splitting and evictions.
llvm-svn: 126282
Use interval sizes instead of spill weights to determine if it is legal to evict
interference. A smaller interval can evict interference if all interfering live
ranges are larger.
Allow multiple interferences to be evicted as along as they are all larger than
the live range being allocated.
Spill weights are still used to select the preferred eviction candidate.
llvm-svn: 126276
This is based on the observation that long live ranges are more difficult to
allocate, so there is a better chance of solving the puzzle by handling the big
pieces first. The allocator will evict and split long alive ranges when they get
in the way.
RABasic is still using spill weights for its priority queue, so the interface to
the queue has been virtualized.
llvm-svn: 126259
share entries. Add a DenseSet to MachineConstantPool for the MachineCPVs that
it owns.
This will hopefully fix the MC/ARM/elf-reloc-01.ll failure on the leaks bots.
llvm-svn: 126218
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working.
- The debugger needs to be aware of prolog_end attribute attached with line table entries.
- The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)
llvm-svn: 126155
An original endpoint is an instruction that killed or defined the original live
range before any live ranges were split.
When splitting global live ranges, avoid creating local live ranges without any
original endpoints. We may still create global live ranges without original
endpoints, but such a range won't be split again, and live range splitting still
terminates.
llvm-svn: 126151
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.
llvm-svn: 126080
The rewriter works almost identically to -rewriter=trivial, except it also
eliminates any identity copies.
This makes the new register allocators independent of VirtRegRewriter.cpp which
will be going away at the same time as RegAllocLinearScan.
llvm-svn: 125967
A local live range is live in a single basic block. If such a range fails to
allocate, try to find a sub-range that would get a larger spill weight than its
interference.
llvm-svn: 125764
the time but presumably my email got lost). Examples where the previous logic
got it wrong: (1) a signed i8 multiply of 64 by 2 overflows, but the high part is
zero; (2) a signed i8 multiple of -128 by 2 overflows, but the high part is all
ones.
llvm-svn: 125748
transformation if we can't legally create a build vector of the correct
type. Check that we can make the transformation first, and add a TODO to
refactor this code with similar cases.
Fixes: PR9223 and rdar://9000350
llvm-svn: 125631
Machine instruction range consisting of only DBG_VALUE MIs only contributes consecutive labels in assembly output, which is harmless, and empty scope entry in DebugInfo, which confuses debugger tools.
llvm-svn: 125577
Simplify the spill weight calculation a bit by bypassing
getApproximateInstructionCount() and using LiveInterval::getSize() directly.
This changes the computed spill weights, but only by a constant factor in each
function. It should not affect how spill weights compare against each other, and
so it shouldn't affect code generation.
llvm-svn: 125530
have their low bits set to zero. This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.
Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively. Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST). The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.
llvm-svn: 125470
generating i8 shift amounts for things like i1024 types. Add
an assert in getNode to prevent this from occuring in the future,
fix the buggy transformation, revert my previous patch, and
document this gotcha in ISDOpcodes.h
llvm-svn: 125465
The DAGCombiner created illegal BUILD_VECTOR operations.
The patch added a check that either illegal operations are
allowed or that the created operation is legal.
llvm-svn: 125435
The bug happens when the DAGCombiner attempts to optimize one of the patterns
of the SUB opcode. It tries to create a zero of type v2i64. This type is legal
on 32bit machines, but the initializer of this vector (i64) is target dependent.
Currently, the initializer attempts to create an i64 zero constant, which fails.
Added a flag to tell the DAGCombiner to create a legal zero, if we require that
the pass would generate legal types.
llvm-svn: 125391
Registers are not allocated strictly in spill weight order when live range
splitting and spilling has created new shorter intervals with higher spill
weights.
When one of the new heavy intervals conflicts with a single lighter interval,
simply evict the old interval instead of trying to split the heavy one.
The lighter interval is a better candidate for splitting, it has a smaller use
density.
llvm-svn: 125151
If a live range is used by a terminator instruction, and that live range needs
to leave the block on the stack or in a different register, it can be necessary
to have both sides of the split live at the terminator instruction.
Example:
%vreg2 = COPY %vreg1
JMP %vreg1
Becomes after spilling %vreg2:
SPILL %vreg1
JMP %vreg1
The spill doesn't kill the register as is normally the case.
llvm-svn: 125102
Avoid using the same register for two def operands or and earlyclobber
def and use operand. This fixes PR8986 and improves on the prior fix
for rdar://problem/8959122.
llvm-svn: 125089
After uses of a live range are removed, recompute the live range to only cover
the remaining uses. This is necessary after rematerializing the value before
some (but not all) uses.
llvm-svn: 125058
<rdar://problem/8959122> illegal register operands for UMULL instruction in cfrac nightly test
I'm stil working on a unit test, but the case is:
rx = movcc rx, r3
r2 = ldr
r2, r3 = umull r2, r2
The anti-dep breaker should not convert this into an illegal instruction:
r2, r2 = umull
llvm-svn: 124932
If interference reaches the last split point, it is effectively live out and
should be marked as 'MustSpill'.
This can make a difference when the terminator uses a register. There is no way
that register can be reused in the outgoing CFG bundle, even if it isn't live
out.
llvm-svn: 124900
A live range cannot be split everywhere in a basic block. A split must go before
the first terminator, and if the variable is live into a landing pad, the split
must happen before the call that can throw.
llvm-svn: 124894
We should not be attempting a region split if it won't lead to at least one
directly allocatable interval. That could cause infinite splitting loops.
llvm-svn: 124893
precisely track pressure on a selection DAG, but we can at least keep
it balanced. This design accounts for various interesting aspects of
selection DAGS: register and subregister copies, glued nodes, dead
nodes, unused registers, etc.
Added SUnit::NumRegDefsLeft and ScheduleDAGSDNodes::RegDefIter.
Note: I disabled PrescheduleNodesWithMultipleUses when register
pressure is enabled, based on no evidence other than I don't think it
makes sense to have both enabled.
llvm-svn: 124853
When the live range is live through a block that doesn't use the register, but
that has interference, region splitting wants to split at the top and bottom of
the basic block.
llvm-svn: 124839
Allow a live range to end with a kill flag, but don't allow a kill flag that
doesn't end the live range.
This makes the machine code verifier more useful during register allocation when
kill flag computation is deferred.
llvm-svn: 124838
If the found value is not live-through the block, we should only add liveness up
to the requested slot index. When the value is live-through, the whole block
should be colored.
Bug found by SSA verification in the machine code verifier.
llvm-svn: 124812
The greedy register allocator revealed some problems with the value mapping in
SplitKit. We would sometimes start mapping values before all defs were known,
and that could change a value from a simple 1-1 mapping to a multi-def mapping
that requires ssa update.
The new approach collects all defs and register assignments first without
filling in any live intervals. Only when finish() is called, do we compute
liveness and mapped values. At this time we know with certainty which values map
to multiple values in a split range.
This also has the advantage that we can compute live ranges based on the
remaining uses after rematerializing at split points.
The current implementation has many opportunities for compile time optimization.
llvm-svn: 124765
the load, then it may be legal to transform the load and store to integer
load and store of the same width.
This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []
to
ldr r12, []
str r12, []
rdar://8944252
llvm-svn: 124708
This is similar to the -unroll-threshold option. There should be no change in
behavior when -tail-dup-size is not explicit on the llc command line.
llvm-svn: 124564
This happens all the time when a smul is promoted to a larger type.
On x86-64 we now compile "int test(int x) { return x/10; }" into
movslq %edi, %rax
imulq $1717986919, %rax, %rax
movq %rax, %rcx
shrq $63, %rcx
sarq $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
addl %ecx, %eax
This fires 96 times in gcc.c on x86-64.
llvm-svn: 124559
This happens e.g. for code like "X - X%10" where we lower the modulo operation
to a series of multiplies and shifts that are then subtracted from X, leading to
this missed optimization.
llvm-svn: 124532
rdar://problem/8893967: JM/lencod miscompile at -arch armv7 -mthumb -O3
Added ResurrectKill to remove kill flags after we decide to reused a
physical register. And (hopefully) ensure that we call it in all the
right places.
Sorry, I'm not checking in a unit test given that it's a miscompile I
can't reproduce easily with a toy example. Failures in the rewriter
depend on a series of heuristic decisions maked during one of the many
upstream phases in codegen. This case would require coercing regalloc
to generate a couple of rematerialzations in a way that causes the
scavenger to reuse the same register at just the wrong point.
The general way to test this is to implement kill flags
verification. Then we could have a simple, robust compile-only unit
test. That would be worth doing if the whole pass was not about to
disappear. At this point we focus verification work on the next
generation of regalloc.
llvm-svn: 124442
Linear scan regalloc is currently assuming that any register aliased with
a member of a regclass must also be in at least one regclass. That is not
always true. For example, for X86, RIP is in a regclass but IP is not.
If you're unlucky, this can cause a crash by invalidating the iterator.
llvm-svn: 124365
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR. Eventually this
will get matched to VINSERTF128 if AVX is available.
llvm-svn: 124307
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.
llvm-svn: 124292
clang's -Wuninitialized-experimental warning.
While these don't look like real bugs, clang's
-Wuninitialized-experimental analysis is stricter
than GCC's, and these fixes have the benefit
of being general nice cleanups.
llvm-svn: 124073
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
llvm-svn: 123971
flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
llvm-svn: 123969
The value mapping gets confused about which original values have multiple new
definitions so they may need phi insertions.
This could probably be simplified by letting enterIntvBefore() take a live range
to be added following the instruction. As long as the range stays inside the
same basic block, value mapping shouldn't be a problem.
llvm-svn: 123926
to add/sub by doing the normal operation and then checking for overflow
afterwards. This generally relies on the DAG handling the later invalid
operations as well.
Fixes the 64-bit part of rdar://8622122 and rdar://8774702.
llvm-svn: 123908
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
llvm-svn: 123905
Added a check for already live regs before claiming HighRegPressure.
Fixed a few cases of checking the wrong number of successors.
Added some tracing until these heuristics are better understood.
llvm-svn: 123892
with an invalid type then split the result and perform the overflow check
normally.
Fixes the 32-bit parts of rdar://8622122 and rdar://8774702.
llvm-svn: 123864
interval after an instruction. The leaveIntvAfter() method only adds liveness
from the instruction's boundary index to the inserted copy.
Ideally, SplitKit should be smarter about this, perhaps by combining useIntv()
and leaveIntvAfter() into one method that guarantees continuity.
llvm-svn: 123858
Region splitting includes loop splitting as a subset, and it is more generic.
The splitting heuristics for variables that are live in more than one block are
now:
1. Try to create a region that covers multiple basic blocks.
2. Try to create a new live range for each block with multiple uses.
3. Spill.
Steps 2 and 3 are similar to what the standard spiller is doing.
llvm-svn: 123853
Analyze the live range's behavior entering and leaving basic blocks. Compute an
interference pattern for each allocation candidate, and use SpillPlacement to
find an optimal region where that register can be live.
This code is still not enabled.
llvm-svn: 123774
This shaves off 4 popcounts from the hacked 186.crafty source.
This is enabled even when a native popcount instruction is available. The
combined code is one operation longer but it should be faster nevertheless.
llvm-svn: 123621
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.
I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.
llvm-svn: 123547
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
llvm-svn: 123468
It will still return an iterator that points to the first terminator or end(),
but there may be DBG_VALUE instructions following the first terminator.
llvm-svn: 123384
For one, MachineBasicBlock::getFirstTerminator() doesn't understand what is
happening, and it also makes sense to have all control flow run through the
DBG_VALUE.
llvm-svn: 123277
There's an inherent tension in DAGCombine between assuming
that things will be put in canonical form, and the Depth
mechanism that disables transformations when recursion gets
too deep. It would not surprise me if there's a lot of little
bugs like this one waiting to be discovered. The mechanism
seems fragile and I'd suggest looking at it from a design viewpoint.
llvm-svn: 123191
when no virtual registers have been allocated.
It was only used to resize IndexedMaps, so provide an IndexedMap::resize()
method such that
Map.grow(MRI.getLastVirtReg());
can be replaced with the simpler
Map.resize(MRI.getNumVirtRegs());
This works correctly when no virtuals are allocated, and it bypasses the to/from
index conversions.
llvm-svn: 123130
physical register numbers.
This makes the hack used in LiveInterval official, and lets LiveInterval be
oblivious of stack slots.
The isPhysicalRegister() and isVirtualRegister() predicates don't know about
this, so when a variable may contain a stack slot, isStackSlot() should always
be tested first.
llvm-svn: 123128
of using a Location class with the same information.
When making a copy of a MachineOperand that was already stored in a
MachineInstr, it is necessary to clear the parent pointer on the copy. Otherwise
the register use-def lists become inconsistent.
Add MachineOperand::clearParent() to do that. An alternative would be a custom
MachineOperand copy constructor that cleared ParentMI. I didn't want to do that
because of the performance impact.
llvm-svn: 123109
Print virtual registers numbered from 0 instead of the arbitrary
FirstVirtualRegister. The first virtual register is printed as %vreg0.
TRI::NoRegister is printed as %noreg.
llvm-svn: 123107
Provide MRI::getNumVirtRegs() and TRI::index2VirtReg() functions to allow
iteration over virtual registers without depending on the representation of
virtual register numbers.
llvm-svn: 123098
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.
The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.
llvm-svn: 123064
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.
This allows memory instructions to be moved around INLINEASM instructions.
llvm-svn: 123044
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
We were never generating any of these nodes with variable indices, and there
was one legalizer function asserting on a non-constant index. If we ever have
a need to support variable indices, we can add this back again.
llvm-svn: 122993
This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.
Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.
The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.
This is a work in progress and still not hooked up to RegAllocGreedy.
llvm-svn: 122938
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501
llvm-svn: 122936
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
llvm-svn: 122832
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
llvm-svn: 122710
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
llvm-svn: 122707
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:
1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.
2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.
3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.
I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.
I also haven't yet run any tests of the quality of the generated code.
llvm-svn: 122582
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.
llvm-svn: 122581
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
llvm-svn: 122472
loads properly. We miscompiled the testcase into:
_test: ## @test
movl $128, (%rdi)
movzbl 1(%rdi), %eax
ret
Now we get a proper:
_test: ## @test
movl $128, (%rdi)
movsbl (%rdi), %eax
movzbl %ah, %eax
ret
This fixes PR8757.
llvm-svn: 122392
count operand. These should be the same but apparently are
not always, and this is cleaner anyway. This improves the
code in an existing test.
llvm-svn: 122354
of the problems with my last attempt were in the updating of LiveIntervals
rather than the coalescing itself. Therefore, I decided to get that right first
by essentially reimplementing the existing PHIElimination using LiveIntervals.
It works correctly, with only a few tests failing (which may not be legitimate
failures) and no new verifier failures (at least as far as I can tell, I didn't
count the number per file).
llvm-svn: 122321
Edge bundles is an annotation on the CFG that turns it into a bipartite directed
graph where each basic block is connected to an outgoing and an ingoing bundle.
These bundles are useful for identifying regions of the CFG for live range
splitting.
llvm-svn: 122301
ARM (and other 32-bit-only) targets support for i8 and i16 overflow
multiplies. The generated code isn't great, but this at least fixes
CodeGen/Generic/overflow.ll when running on ARM hosts.
llvm-svn: 122221
Imagine we see:
EFLAGS = inst1
EFLAGS = inst2 FLAGS
gpr = inst3 EFLAGS
Previously, we would refuse to schedule inst2 because it clobbers
the EFLAGS of the predecessor. However, it also uses the EFLAGS
of the predecessor, so it is safe to emit. SDep edges ensure that
the right order happens already anyway.
This fixes 2 testsuite crashes with the X86 patch I'm going to
commit next.
llvm-svn: 122211
alternative register allocator that does not require LiveIntervals by specifying
it on the command-line for a target that has StrongPHIElimination enabled by
default.
These checks are pretty meaningless anyways, since StrongPHIElimination and
PHIElimination are never used at the same time.
llvm-svn: 122176
use before rematerializing the load.
This allows us to produce:
addps LCPI0_1(%rip), %xmm2
Instead of:
movaps LCPI0_1(%rip), %xmm3
addps %xmm3, %xmm2
Saving a register and an instruction. The standard spiller already knows how to
do this.
llvm-svn: 122133
the loop predecessors.
The register can be live-out from a predecessor without being live-in to the
loop header if there is a critical edge from the predecessor.
llvm-svn: 122123
createMachineVerifierPass and MachineFunction::verify.
The banner is printed before the machine code dump, just like the printer pass.
llvm-svn: 122113
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.
Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)
<rdar://problem/8782198>
llvm-svn: 122104
BUILD_VECTOR operands where the element type is not legal. I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.
llvm-svn: 122102
the operand uses the same register as a tied operand:
%r1 = add %r1, %r1
If add were a three-address instruction, kill flags would be required on at
least one of the uses. Since it is a two-address instruction, the tied use
operand must not have a kill flag.
This change makes the kill flag on the untied use operand optional.
llvm-svn: 122082
This is a three-way interval list intersection between a virtual register, a
live interval union, and a loop. It will be used to identify interference-free
loops for live range splitting.
llvm-svn: 122034
A MachineLoopRange contains the intervals of slot indexes covered by the blocks
in a loop. This representation of the loop blocks is more efficient to compare
against interfering registers during register coalescing.
llvm-svn: 121917
Bypass loops have the current live range live through, but contain no uses or
defs. Splitting around a bypass loop can free registers for other uses inside
the loop by spilling the split range.
llvm-svn: 121871
regB = move RCX
regA = op regB, regC
RAX = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
movl %edi, %eax
shlq $32, %rcx
leaq (%rcx,%rax), %rax
=>
movl %edi, %eax
shlq $32, %rcx
orq %rcx, %rax
rdar://8762995
llvm-svn: 121793
when the wider type is legal. This allows us to compile:
define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
%div = udiv i16 %x, 33
ret i16 %div
}
into:
test1: # @test1
movzwl 4(%esp), %eax
imull $63551, %eax, %eax # imm = 0xF83F
shrl $21, %eax
ret
instead of:
test1: # @test1
movw $-1985, %ax # imm = 0xFFFFFFFFFFFFF83F
mulw 4(%esp)
andl $65504, %edx # imm = 0xFFE0
movl %edx, %eax
shrl $5, %eax
ret
Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320
We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.
llvm-svn: 121696
for each constant pool entry. Using WriteTypeSymbolic here takes time
proportional to the size of the module, for each constant pool entry.
This speeds up -verbose-asm llc on 252.eon (a random testcase at my disposal)
from 4.4s to 2.137s. llc takes 2.11s with asm-verbose off, so this is now a
pretty reasonable cost for verbose comments.
llvm-svn: 121691
The spiller should only spill. The register allocator will drive live range
splitting, it has the needed information about register pressure and
interferences.
llvm-svn: 121590
registers for a given virtual register.
Reserved registers are filtered from the allocation order, and any valid hint is
returned as the first suggestion.
For target dependent hints, a number of arcane target hooks are invoked.
llvm-svn: 121497
references instead.
Similarly, IntervalMap::begin() is almost as expensive as find(), so use find(x)
instead of begin().advanceTo(x);
This makes RegAllocBasic run another 5% faster.
llvm-svn: 121344
abstract priority queue interface in subclasses that want to override the
priority calculations.
Subclasses must provide a getPriority() implementation instead.
This approach requires less code as long as priorities are expressable as simple
floats, and it avoids the dangers of defining potentially expensive priority
comparison functions.
It also should speed up priority_queue operations since they no longer have to
chase pointers when comparing registers. This is not measurable, though.
Preferably, we shouldn't use floats to guide code generation. The use of floats
here is derived from the use of floats for spill weights. Spill weights have a
dynamic range that doesn't lend itself easily to a fixpoint implementation.
When someone invents a stable spill weight representation, it can be reused for
allocation priorities.
llvm-svn: 121294
both forward and backward scheduling. Rename it to
ScoreboardHazardRecognizer (Scoreboard is one word). Remove integer
division from the scoreboard's critical path.
llvm-svn: 121274
This new register allocator is initially identical to RegAllocBasic, but it will
receive all of the tricks that RegAllocBasic won't get.
RegAllocGreedy will eventually replace linear scan.
llvm-svn: 121234
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
The StrongPHIElimination pass did not work, and nobody has worked on it for two
years.
A rewrite is underway, so I am leaving this shell pass instead of deleting it
completely.
llvm-svn: 120830
Scan the MachineFunction for DBG_VALUE instructions, and replace them with a
data structure similar to LiveIntervals. The live range of a DBG_VALUE is
determined by propagating it down the dominator tree until a new DBG_VALUE is
found. When a DBG_VALUE lives in a register, its live range is confined to the
live range of the register's value.
LiveDebugVariables runs before coalescing, so DBG_VALUEs are not artificially
extended when registers are joined.
The missing half will recreate DBG_VALUE instructions from the intervals when
register allocation is complete.
The pass is disabled by default. It can be enabled with the temporary command
line option -live-debug-variables.
llvm-svn: 120636
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
llvm-svn: 120501
in favor of the widespread llvm style. Capitalize variables and add
newlines for visual parsing. Rename variables for readability.
And other cleanup.
llvm-svn: 120490
This analysis is going to run immediately after LiveIntervals. It will stay
alive during register allocation and keep track of user variables mentioned in
DBG_VALUE instructions.
When the register allocator is moving values between registers and the stack, it
is very hard to keep track of DBG_VALUE instructions. We usually get it wrong.
This analysis maintains a data structure that makes it easy to update DBG_VALUE
instructions.
llvm-svn: 120385
so don't claim they are. They are allocated using DAG.getNode, so attempts
to access MemSDNode fields results in reading off the end of the allocated
memory. This fixes crashes with "llc -debug" due to debug code trying to
print MemSDNode fields for these barrier nodes (since the crashes are not
deterministic, use valgrind to see this). Add some nasty checking to try
to catch this kind of thing in the future.
llvm-svn: 119901
MCStreamer instead of just MCObjectStreamer. Address changes cannot
be as efficient as we have to use DW_LNE_set_addres, but at least
most of the logic is shared.
This will be used so that, with CodeGen still using EmitDwarfLocDirective,
llvm-gcc is able to produce debug_line sections without needing an
assembler that supports .loc.
llvm-svn: 119777
if the extension types were not the same. The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext). Reported
and diagnosed by Sebastien Deldon.
llvm-svn: 119728
and testing is easier. A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for
every address change anymore.
llvm-svn: 119613
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
SrcMgrDiagHandler, we can improve clang diagnostics for inline asm:
instead of reporting them on a source line of the original line,
we can report it on the correct line wherever the string literal came
from. For something like this:
void foo() {
asm("push %rax\n"
".code32\n");
}
we used to get this: (note that the line in t.c isn't helpful)
t.c:4:7: error: warning: ignoring directive for now
asm("push %rax\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
now we get:
t.c:5:8: error: warning: ignoring directive for now
".code32\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
Note that we're pointing to line 5 properly now.
llvm-svn: 119488
cookie argument to the SourceMgr diagnostic stuff. This cleanly separates
LLVMContext's inlineasm handler from the sourcemgr error handling
definition, increasing type safety and cleaning things up.
llvm-svn: 119486
Always spill the full representative register at any point where any subregister
is live.
This fixes PR8620 which caused the old logic to get confused and not spill
anything at all.
The fundamental problem here is that the coalescer is too aggressive about
physical register coalescing. It sometimes makes it impossible to allocate
registers without these emergency spills.
llvm-svn: 119375
The live range of a register defined by an early clobber starts at the use slot,
not the def slot.
Except when it is an early clobber tied to a use operand. Then it starts at the
def slot like a standard def.
llvm-svn: 119305
live ranges for the spill register are also defined at the use slot instead of
the normal def slot.
This fixes PR8612 for the inline spiller. A use was being allocated to the same
register as a spilled early clobber def.
This problem exists in all the spillers. A fix for the standard spiller is
forthcoming.
llvm-svn: 119182
catastrophic compilation time in the event of unreasonable LLVM
IR. Code quality is a separate issue--someone upstream needs to do a
better job of reducing to llvm.memcpy. If the situation can be reproduced with
any supported frontend, then it will be a separate bug.
llvm-svn: 118904
it makes no sense for allocation_order iterators to visit reserved regs.
The inline spiller depends on AliasAnalysis.
Manage the Query state to avoid uninitialized or stale results.
llvm-svn: 118800
This is the first small step towards using closed intervals for liveness instead
of the half-open intervals we're using now.
We want to be able to distinguish between a SlotIndex that represents a variable
being live-out of a basic block, and an index representing a variable live-in to
its successor.
That requires two separate indexes between blocks. One for live-outs and one for
live-ins.
With this change, getMBBEndIdx(MBB).getPrevSlot() becomes stable so it stays
greater than any instructions inserted at the end of MBB.
llvm-svn: 118747
Whenever splitting wants to insert a copy, it checks if the value can be
rematerialized cheaply instead.
Missing features:
- Delete instructions when all uses have been rematerialized.
- Truncate live ranges to the remaining uses after rematerialization.
llvm-svn: 118702
benchmarks hitting an assertion.
Adds LiveIntervalUnion::collectInterferingVRegs.
Fixes "late spilling" by checking for any unspillable live vregs among
all physReg aliases.
llvm-svn: 118701
handle cases in which a register is unavailable for spill code.
Adds LiveIntervalUnion::extract. While processing interferences on a
live virtual register, reuses the same Query object for each
physcial reg.
llvm-svn: 118423
to perform the copy, which may be of lots of memory [*]. It would be good if the
fall-back code generated something reasonable, i.e. did the copy in a loop, rather
than vast numbers of loads and stores. Add a note about this. Currently target
specific code seems to always kick in so this is more of a theoretical issue rather
than a practical one now that X86 has been fixed.
[*] It's amazing how often people pass mega-byte long arrays by copy...
llvm-svn: 118275
This way, InlineSpiller does the same amount of splitting as the standard
spiller. Splitting should really be guided by the register allocator, and
doesn't belong in the spiller at all.
llvm-svn: 118216
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
llvm-svn: 118167
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
BB#1: derived from LLVM BB %bb.nph28
Live Ins: %AL
Predecessors according to CFG: BB#0
TEST8rr %reg16384<kill>, %reg16384, %EFLAGS<imp-def>; GR8:%reg16384
JNE_4 <BB#2>, %EFLAGS<imp-use,kill>
JMP_4 <BB#2>
Successors according to CFG: BB#2 BB#2
These double CFG edges only ever occur in bugpoint-generated code, so there is
no need to attempt something clever.
llvm-svn: 117992
source, and let rewrite() clean it up.
This way, kill flags on the inserted copies are fixed as well during rewrite().
We can't just assume that all the copies we insert are going to be kills since
critical edges into loop headers sometimes require both source and dest to be
live out of a block.
llvm-svn: 117980
At least X86FloatingPoint requires correct kill flags after register allocation,
and targets using register scavenging benefit. Conservative kill flags are not
enough.
llvm-svn: 117960
at more than those which define CPSR. You can have this situation:
(1) subs ...
(2) sub r6, r5, r4
(3) movge ...
(4) cmp r6, 0
(5) movge ...
We cannot convert (2) to "subs" because (3) is using the CPSR set by
(1). There's an analogous situation here:
(1) sub r1, r2, r3
(2) sub r4, r5, r6
(3) cmp r4, ...
(5) movge ...
(6) cmp r1, ...
(7) movge ...
We cannot convert (1) to "subs" because of the intervening use of CPSR.
llvm-svn: 117950
looks like is happening:
Without the peephole optimizer:
(1) sub r6, r6, #32
orr r12, r12, lr, lsl r9
orr r2, r2, r3, lsl r10
(x) cmp r6, #0
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(2) sub r8, r8, #32
(a) movge r12, lr, lsr r6
(y) cmp r8, #0
LPC2_10:
ldr lr, [pc, r10]
(b) movge r2, r3, lsr r8
With the peephole optimizer:
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(1*) subs r6, r6, #32
(2*) subs r8, r8, #32
(a*) movge r12, lr, lsr r6
(b*) movge r2, r3, lsr r8
(1) is used by (x) for the conditional move at (a). (2) is used by (y) for the
conditional move at (b). After the peephole optimizer, these the flags resulting
from (1*) are ignored and only the flags from (2*) are considered for both
conditional moves.
llvm-svn: 117876
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
We don't want unused values forming their own equivalence classes, so we lump
them all together in one class, and then merge them with the class of the last
used value.
llvm-svn: 117670
in SSAUpdaterImpl.h
Verifying live intervals revealed that the old method was completely wrong, and
we need an iterative approach to calculating PHI placemant. Fortunately, we have
MachineDominators available, so we don't have to compute that over and over
like SSAUpdaterImpl.h must.
Live-out values are cached between calls to mapValue() and computed in a greedy
way, so most calls will be working with very small block sets.
Thanks to Bob for explaining how this should work.
llvm-svn: 117599
proper SSA updating.
This doesn't cause MachineDominators to be recomputed since we are already
requiring MachineLoopInfo which uses dominators as well.
llvm-svn: 117598
There are currently 100 references to COFF::IMAGE_SCN in 6 files
and 11 different functions. Section to attribute mapping really
needs to happen in one place to avoid problems like this.
llvm-svn: 117473
Critical edges going into a loop are not as bad as critical exits. We can handle
them by splitting the critical edge, or by having both inside and outside
registers live out of the predecessor.
llvm-svn: 117423
the remainder register.
Example:
bb0:
x = 1
bb1:
use(x)
...
x = 2
jump bb1
When x is isolated in bb1, the inner part breaks into two components, x1 and x2:
bb0:
x0 = 1
bb1:
x1 = x0
use(x1)
...
x2 = 2
x0 = x2
jump bb1
llvm-svn: 117408
do not double-count the duplicate instructions by counting once from the
beginning and again from the end. Keep track of where the duplicates from
the beginning ended and don't go past that point when counting duplicates
at the end. Radar 8589805.
This change causes one of the MC/ARM/simple-fp-encoding tests to produce
different (better!) code without the vmovne instruction being tested.
I changed the test to produce vmovne and vmoveq instructions but moving
between register files in the opposite direction. That's not quite the same
but predicated versions of those instructions weren't being tested before,
so at least the test coverage is not any worse, just different.
llvm-svn: 117333
instructions separately from the count of non-predicated instructions. The
instruction count is used in places to determine how many instructions to
copy, predicate, etc. and things get confused if that count includes the
extra cost for microcoded ops.
llvm-svn: 117332