Stop depending on the LiveIntervalUnions in RegAllocBase, they are about
to be removed.
The changes are mostly replacing register alias iterators with regunit
iterators, and querying LiveRegMatrix instrad of RegAllocBase.
InterferenceCache is converted to work with per-regunit
LiveIntervalUnions, and it checks fixed regunit interference separately,
using the fixed live intervals provided by LiveIntervalAnalysis.
The local splitting helper calcGapWeights() is also considering fixed
regunit interference which is kept on the side now.
llvm-svn: 158867
That is a DenseMap iterator keyed by pointers, so the iteration order is
nondeterministic.
I would like to replace the DenseMap with an IndexedMap which doesn't
allow iteration.
llvm-svn: 158856
Regunit live ranges are computed on demand, so when mi-sched calls
handleMove, some regunits may not have live ranges yet.
That makes updating them easier: Just skip the non-existing ranges. They
will be computed correctly from the rescheduled machine code when they
are needed.
llvm-svn: 158831
I'll admit I'm not entirely satisfied with this change, but it seemed
the cleanest option. Other suggestions quite welcome
The issue is that the traits specializations have static methods which
return the typedef'ed PHI_iterator type. In both the IR and MI layers
this is typedef'ed to a custom iterator class defined in an anonymous
namespace giving the types and the functions returning them internal
linkage. However, because the traits specialization is defined in the
'llvm' namespace (where it has to be, specialized template lives there),
and is in turn used in the templated implementation of the SSAUpdater.
This led to the linkage conflict that Clang now warns about.
The simplest solution to me was just to define the PHI_iterator as
a nested class inside the trait specialization. That way it still
doesn't get scoped widely, it can't be accidentally reused somewhere,
etc. This is a little gross just because nested class definitions are
a little gross, but the alternatives seem more ad-hoc.
llvm-svn: 158799
-stable-loops enables a new algorithm for generating the Loop
forest. It differs from the original algorithm in a few respects:
- Not determined by use-list order.
- Initially guarantees RPO order of block and subloops.
- Linear in the number of CFG edges.
- Nonrecursive.
I didn't want to change the LoopInfo API yet, so the block lists are
still inclusive. This seems strange to me, and it means that building
LoopInfo is not strictly linear, but it may not be a problem in
practice. At least the block lists start out in RPO order now. In the
future we may add an attribute or wrapper analysis that allows other
passes to assume RPO order.
The primary motivation of this work was not to optimize LoopInfo, but
to allow reproducing performance issues by decomposing the compilation
stages. I'm often unable to do this with the current LoopInfo, because
the loop tree order determines Loop pass order. Serializing the IR
tends to invert the order, which reverses the optimization order. This
makes it nearly impossible to debug interdependent loop optimizations
such as LSR.
I also believe this will provide more stable performance results across time.
llvm-svn: 158790
The implementation only needs inclusion from LoopInfo.cpp and
MachineLoopInfo.cpp. Clients of the interface should only include the
interface. This makes the interface readable and speeds up rebuilds
after modifying the implementation.
llvm-svn: 158787
When LiveIntervals is tracking fixed interference in regunits, make sure
to update those intervals as well. Currently guarded by -live-regunits.
llvm-svn: 158766
ensureAlignment() in MachineFunction). Also, drop setMaxAlignment() in
favor of this new function. This creates a main entry point to setting
MaxAlignment, which will be helpful for future work. No functionality
change intended.
llvm-svn: 158758
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
OR
UnsafeFPMath option (-enable-unsafe-fp-math)
are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).
If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.
llvm-svn: 158757
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
llvm-svn: 158743
TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.
Add a command line option to llc that enables it.
llvm-svn: 158692
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
llvm-svn: 158679
For store->load dependencies that may alias, we should always use
TrueMemOrderLatency, which may eventually become a subtarget hook. In
effect, we should guarantee at least TrueMemOrderLatency on at least
one DAG path from a store to a may-alias load.
This should fix the standard mode as well as -enable-aa-sched-mi".
llvm-svn: 158380
The LiveRegMatrix represents the live range of assigned virtual
registers in a Live interval union per register unit. This is not
fundamentally different from the interference tracking in RegAllocBase
that both RABasic and RAGreedy use.
The important differences are:
- LiveRegMatrix tracks interference per register unit instead of per
physical register. This makes interference checks cheaper and
assignments slightly more expensive. For example, the ARM D7 reigster
has 24 aliases, so we would check 24 physregs before assigning to one.
With unit-based interference, we check 2 units before assigning to 2
units.
- LiveRegMatrix caches regmask interference checks. That is currently
duplicated functionality in RABasic and RAGreedy.
- LiveRegMatrix is a pass which makes it possible to insert
target-dependent passes between register allocation and rewriting.
Such passes could tweak the register assignments with interference
checking support from LiveRegMatrix.
Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix.
llvm-svn: 158255
This deduplicates some code from the optimizing register allocators, and
it means that it is now possible to change the register allocators'
solutions simply by editing the VirtRegMap between the register
allocator pass and the rewriter.
llvm-svn: 158249
OK, not really. We don't want to reintroduce the old rewriter hacks.
This patch extracts virtual register rewriting as a separate pass that
runs after the register allocator. This is possible now that
CodeGen/Passes.cpp can configure the full optimizing register allocator
pipeline.
The rewriter pass uses register assignments in VirtRegMap to rewrite
virtual registers to physical registers, and it inserts kill flags based
on live intervals.
These finalization steps are the same for the optimizing register
allocators: RABasic, RAGreedy, and PBQP.
llvm-svn: 158244
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.
The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.
To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.
llvm-svn: 158212
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.
llvm-svn: 158122
Bundles should be treated as one atomic transaction when checking
liveness. That is how the register allocator (and VLIW targets) treats
bundles.
llvm-svn: 158116
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.
llvm-svn: 158096
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.
llvm-svn: 158090
Soon we'll be making LiveIntervalUnions for register units as well.
This was the only place using the RepReg member, so just remove it.
llvm-svn: 158038
Don't print out the register number and spill weight, making the TRI
argument unnecessary.
This allows callers to interpret the reg field. It can currently be a
virtual register, a physical register, a spill slot, or a register unit.
llvm-svn: 158031
Instead of computing a live interval per physreg, LiveIntervals can
compute live intervals per register unit. This makes impossible the
confusing situation where aliasing registers could have overlapping live
intervals. It should also make fixed interferernce checking cheaper
since registers have fewer register units than aliases.
Live intervals for regunits are computed on demand, using MRI use-def
chains and the new LiveRangeCalc class. Only regunits live in to ABI
blocks are precomputed during LiveIntervals::runOnMachineFunction().
The regunit liveness computations don't depend on LiveVariables.
llvm-svn: 158029
expression (a * b + c) that can be implemented as a fused multiply-add (fma)
if the target determines that this will be more efficient. This intrinsic
will be used to implement FP_CONTRACT support and an aggressive FMA formation
mode.
If your target has a fast FMA instruction you should override the
isFMAFasterThanMulAndAdd method in TargetLowering to return true.
llvm-svn: 158014
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.
llvm-svn: 157979
valid itinerary but no pipeline stages.
An itinerary can contain useful scheduling information without specifying pipeline stages for each instruction.
llvm-svn: 157977
It is an old function that does a lot more than required by
CalcSpillWeights, which was the only remaining caller.
The isRematerializable() function never actually sets the isLoad
argument, so don't try to compute that.
llvm-svn: 157973
IntegersSubsetGeneric, IntegersSubsetMapping: added IntTy template parameter, that allows use either APInt or IntItem. This change allows to write unittest for these classes.
llvm-svn: 157880
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
llvm-svn: 157854
IntegersSubset devided into IntegersSubsetGeneric and into IntegersSubset itself. The first has no references to ConstantInt and works with IntItem only.
IntegersSubsetMapping also made generic. Here added second template parameter "IntegersSubsetTy" that allows to use on of two IntegersSubset types described below.
llvm-svn: 157815
types, as well as int<->ptr casts. This allows us to tailcall functions
with some trivial casts between the call and return (i.e. because the
return types disagree).
llvm-svn: 157798
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
llvm-svn: 157755
It helps compile exotic inline asm. In the test case, normal GR32
virtual registers use up eax-edx so the final GR32_ABCD live range has
no registers left. Since all the live ranges were tiny, we had no way of
prioritizing the smaller register class.
This patch allows tiny unspillable live ranges to be evicted by tiny
unspillable live ranges from a smaller register class.
<rdar://problem/11542429>
llvm-svn: 157715
Besides adding the new insertPass function, this patch uses it to
enhance the existing -print-machineinstrs so that the MachineInstrs
after a specific pass can be printed.
Patch by Bin Zeng!
llvm-svn: 157655
ranges for the instruction about to be bundled. This fixes a bug in an external
project where an assertion was triggered due to spurious 'multiple defs' within
the bundle.
Patch by Ivan Llopard. Thanks Ivan!
llvm-svn: 157632
Implemented IntItem - the wrapper around APInt. Why not to use APInt item directly right now?
1. It will very difficult to implement case ranges as series of small patches. We got several large and heavy patches. Each patch will about 90-120 kb. If you replace ConstantInt with APInt in SwitchInst you will need to changes at the same time all Readers,Writers and absolutely all passes that uses SwitchInst.
2. We can implement APInt pool inside and save memory space. E.g. we use several switches that works with 256 bit items (switch on signatures, or strings). We can avoid value duplicates in this case.
3. IntItem can be easyly easily replaced with APInt.
4. Currenly we can interpret IntItem both as ConstantInt and as APInt. It allows to provide SwitchInst methods that works with ConstantInt for non-updated passes.
Why I need it right now? Currently I need to update SimplifyCFG pass (EqualityComparisons). I need to work with APInts directly a lot, so peaces of code
ConstantInt *V = ...;
if (V->getValue().ugt(AnotherV->getValue()) {
...
}
will look awful. Much more better this way:
IntItem V = ConstantIntVal->getValue();
if (AnotherV < V) {
}
Of course any reviews are welcome.
P.S.: I'm also going to rename ConstantRangesSet to IntegersSubset, and CRSBuilder to IntegersSubsetMapping (allows to map individual subsets of integers to the BasicBlocks).
Since in future these classes will founded on APInt, it will possible to use them in more generic ways.
llvm-svn: 157576
definition in the map before calling itself to retrieve the
DIE for the declaration. Without this change, if this causes
getOrCreateSubprogramDIE to be recursively called on the definition,
it will create multiple DIEs for that definition. Fixes PR12831.
llvm-svn: 157541
SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move
the most likely condition to the front so it is checked first and the others can
be skipped. This is currently not as effective as it could be because SimplifyCFG
destroys profiling metadata when merging branches and switches. Merging branch
weight metadata is tricky though.
This code touches at most 3 cases so I didn't use a proper sorting algorithm.
llvm-svn: 157521
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479
The Hazard checker implements in-order contraints, or interlocked
resources. Ready instructions with hazards do not enter the available
queue and are not visible to other heuristics.
The major code change is the addition of SchedBoundary to encapsulate
the state at the top or bottom of the schedule, including both a
pending and available queue.
The scheduler now counts cycles in sync with the hazard checker. These
are minimum cycle counts based on known hazards.
Targets with no itinerary (x86_64) currently remain at cycle 0. To fix
this, we need to provide some maximum issue width for all targets. We
also need to add the concept of expected latency vs. minimum latency.
llvm-svn: 157427
Live ranges with a constrained register class may benefit from splitting
around individual uses. It allows the remaining live range to use a
larger register class where it may allocate. This is like spilling to a
different register class.
This is only attempted on constrained register classes.
<rdar://problem/11438902>
llvm-svn: 157354
Now that the coalescer keeps live intervals and machine code in sync at
all times, it needs to deal with identity copies differently.
When merging two virtual registers, all identity copies are removed
right away. This means that other identity copies must come from
somewhere else, and they are going to have a value number.
Deal with such copies by merging the value numbers before erasing the
copy instruction. Otherwise, we leave dangling value numbers in the live
interval.
This fixes PR12927.
llvm-svn: 157340
This helps compile time when the greedy register allocator splits live
ranges in giant functions. Without the bias, we would try to grow
regions through the giant edge bundles, usually to find out that the
region became too big and expensive.
If a live range has many uses in blocks near the giant bundle, the small
negative bias doesn't make a big difference, and we still consider
regions including the giant edge bundle.
Giant edge bundles are usually connected to landing pads or indirect
branches.
llvm-svn: 157174
With physreg joining out of the way, it is easy to recognize the
instructions that need their kill flags cleared while testing for
interference.
This allows us to skip the final scan of all instructions for an 11%
speedup of the coalescer pass.
llvm-svn: 157169
may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve
the other operands when calling UpdateNodeOperands. Fixes PR12889.
llvm-svn: 157162
Dead code elimination during coalescing could cause a virtual register
to be split into connected components. The following rewriting would be
confused about the already joined copies present in the code, but
without a corresponding value number in the live range.
Erase all joined copies instantly when joining intervals such that the
MI and LiveInterval representations are always in sync.
llvm-svn: 157135
Dead code and joined copies are now eliminated on the fly, and there is
no need for a post pass.
This makes the coalescer work like other modern register allocator
passes: Code is changed on the fly, there is no pending list of changes
to be committed.
llvm-svn: 157132
The late dead code elimination is no longer necessary.
The test changes are cause by a register hint that can be either %rdi or
%rax. The choice depends on the use list order, which this patch changes.
llvm-svn: 157131
Before rewriting uses of one value in A to register B, check that there
are no tied uses. That would require multiple A values to be rewritten.
This bug can't bite in the current version of the code for a fairly
subtle reason: A tied use would have caused 2-addr to insert a copy
before the use. If the copy has been coalesced, it will be found by the
same loop changed by this patch, and the optimization is aborted.
This was exposed by 400.perlbench and lua after applying a patch that
deletes joined copies aggressively.
llvm-svn: 157130
Remaining virtreg->physreg copies were rematerialized during
updateRegDefsUses(), but we already do the same thing in joinCopy() when
visiting the physreg copy instruction.
Eliminate the preserveSrcInt argument to reMaterializeTrivialDef(). It
is now always true.
llvm-svn: 157103
Dead copies cause problems because they are trivial to coalesce, but
removing them gived the live range a dangling end point. This patch
enables full dead code elimination which trims live ranges to their uses
so end points don't dangle.
DCE may erase multiple instructions. Put the pointers in an ErasedInstrs
set so we never risk visiting erased instructions in the work list.
There isn't supposed to be any dead copies entering RegisterCoalescer,
but they do slip by as evidenced by test/CodeGen/X86/coalescer-dce.ll.
llvm-svn: 157101
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.
Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.
data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"
The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.
rdar://11459456
llvm-svn: 157062
It is no longer necessary to separate VirtCopies, PhysCopies, and
ImpDefCopies. Implicitly defined copies are extremely rare after we
added the ProcessImplicitDefs pass, and physical register copies are not
joined any longer.
llvm-svn: 157059
This has been disabled for a while, and it is not a feature we want to
support. Copies between physical and virtual registers are eliminated by
good hinting support in the register allocator. Joining virtual and
physical registers is really a form of register allocation, and the
coalescer is not properly equipped to do that. In particular, it cannot
backtrack coalescing decisions, and sometimes that would cause it to
create programs that were impossible to register allocate, by exhausting
a small register class.
It was also very difficult to keep track of the live ranges of aliasing
registers when extending the live range of a physreg. By disabling
physreg joining, we can let fixed physreg live ranges remain constant
throughout the register allocator super-pass.
One type of physreg joining remains: A virtual register that has a
single value which is a copy of a reserved register can be merged into
the reserved physreg. This always lowers register pressure, and since we
don't compute live ranges for reserved registers, there are no problems
with aliases.
llvm-svn: 157055
SelectionDAGBuilder::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
llvm-svn: 157046
non-profitable commute using outdated info. The test case would still fail
because of poor pre-RA schedule. That will be fixed by MI scheduler.
rdar://11472010
llvm-svn: 157038
Introduce the basic strategy for register pressure scheduling.
1) Respect target limits at all times.
2) Indentify critical register classes (pressure sets).
Track pressure within the scheduled region.
Avoid increasing scheduled pressure for critical registers.
3) Avoid exceeding the max pressure of the region prior to scheduling.
Added logic for picking between the top and bottom ready Q's based on
regpressure heuristics.
Status: functional but needs to be asjusted to achieve good results.
llvm-svn: 157006
RegisterCoalescer set <undef> flags on all operands of copy instructions
that are scheduled to be removed. This is so they won't affect
shrinkToUses() by introducing false register reads.
Make sure those <undef> flags are never cleared, or shrinkToUses() could
cause live intervals to end at instructions about to be deleted.
This would be a lot simpler if RegisterCoalescer could just erase joined
copies immediately instead of keeping all the to-be-deleted instructions
around.
This fixes PR12862. Unfortunately, bugpoint can't create a sane test
case for this. Like many other coalescer problems, this failure depends
of a very fragile series of events.
<rdar://problem/11474428>
llvm-svn: 157001
When widening an existing <def,reads-undef> operand to a super-register,
it may be necessary to clear the <undef> flag because the wider register
is now read-modify-write through the instruction.
Conversely, it may be necessary to add an <undef> flag when the
coalescer turns a full-register def into a sub-register def, but the
larger register wasn't live before the instruction.
This happens in test/CodeGen/ARM/coalesce-subregs.ll, but the test
is too small for the <undef> flags to affect the generated code.
llvm-svn: 156951
It is now possible to coalesce weird skewed sub-register copies by
picking a super-register class larger than both original registers. The
included test case produces code like this:
vld2.32 {d16, d17, d18, d19}, [r0]!
vst2.32 {d18, d19, d20, d21}, [r0]
We still perform interference checking as if it were a normal full copy
join, so this is still quite conservative. In particular, the f1 and f2
functions in the included test case still have remaining copies because
of false interference.
llvm-svn: 156878
It is possible to coalesce two overlapping registers to a common
super-register that it larger than both of the original registers.
The important difference is that it may be necessary to rewrite DstReg
operands as well as SrcReg operands because the sub-register index has
changed.
This behavior is still disabled by CoalescerPair.
llvm-svn: 156869
Now both SrcReg and DstReg can be sub-registers of the final coalesced
register.
CoalescerPair::setRegisters still rejects such copies because
RegisterCoalescer doesn't yet handle them.
llvm-svn: 156848
This feature avoids creating edges in the scheduler's dependence graph
for non-aliasing memory operations according to whichever alias
analysis is available. It has been fully tested in Hexagon. Before
making this default, it needs to be extended to handle multiple
MachineMemOperands, compile time needs more evaluation, and
benchmarking on X86 and ARM is needed.
Patch by Sergei Larin!
llvm-svn: 156842
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156599
Prioritize the instruction that comes closest to keeping pressure
under the target's limit. Then prioritize instructions that avoid
increasing the max pressure in the scheduled region. The max pressure
heuristic is a tad aggressive. Later I'll fix it to consider the
unscheduled pressure as well.
WIP: This is mostly functional but untested and not likely to do much good yet.
llvm-svn: 156574
Added getMaxExcessUpward/DownwardPressure. They somewhat abuse the
tracker by speculatively handling an instruction out of order. But it
is convenient for now. In the future, we will cache each instruction's
pressure contribution to make this efficient.
llvm-svn: 156561
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156550
When a combine twiddles an extract_vector, care should be take to preserve
the type of the index operand. No luck extracting a reasonable testcase,
unfortunately.
rdar://11391009
llvm-svn: 156419
At least some of them:
%vreg1:sub_16bit = COPY %vreg2:sub_16bit; GR64:%vreg1, GR32: %vreg2
Previously, we couldn't figure out that the above copy could be
eliminated by coalescing %vreg2 with %vreg1:sub_32bit.
The new getCommonSuperRegClass() hook makes it possible.
This is not very useful yet since the unmodified part of the destination
register usually interferes with the source register. The coalescer
needs to understand sub-register interference checking first.
llvm-svn: 156334
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.
llvm-svn: 156220
The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.
llvm-svn: 156147
to catch cases like:
%reg1024<def> = MOV r1
%reg1025<def> = MOV r0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:
%reg1024<def> = MOV r0
%reg1025<def> = MOV 0
%reg1026<def> = ADD %reg1024, %reg1025
r0 = MOV %reg1026
That did no benefit but rather ensure the last MOV would not be coalesced.
rdar://11355268
llvm-svn: 156048
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.
This ensures that targets define register classes as intended.
llvm-svn: 156046
The TargetPassManager's default constructor wants to initialize the PassManager
to 'null'. But it's illegal to bind a null reference to a null l-value. Make the
ivar a pointer instead.
PR12468
llvm-svn: 155902
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155749
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155668
Cross-class joins have been normal and fully supported for a while now.
With TableGen generating the getMatchingSuperRegClass() hook, they are
unlikely to cause problems again.
llvm-svn: 155552
Remove the heuristic for disabling cross-class joins. The greedy
register allocator can handle the narrow register classes, and when it
splits a live range, it can pick a larger register class.
Benchmarks were unaffected by this change.
<rdar://problem/11302212>
llvm-svn: 155551
MachineInstr sequence.
This uses the new target interface for tracking register pressure
using pressure sets to model overlapping register classes and
subregisters.
RegisterPressure results can be tracked incrementally or stored at
region boundaries. Global register pressure can be deduced from local
RegisterPressure results if desired.
This is an early, somewhat untested implementation. I'm working on
testing it within the context of a register pressure reducing
MachineScheduler.
llvm-svn: 155454
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.
This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.
This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.
The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().
It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.
It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.
Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.
Patch by Andy Zhang!
Thanks to Jakob and Anton for their reviews.
llvm-svn: 155395
The X86 target is editing the selection DAG while isel is selecting
nodes following a topological ordering. When the DAG hacking triggers
CSE, nodes can be deleted and bad things happen.
llvm-svn: 155257
Now that multiple DAGUpdateListeners can be active at the same time,
ISelPosition can become a local variable in DoInstructionSelection.
We simply register an ISelUpdater with CurDAG while ISelPosition exists.
llvm-svn: 155249
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
llvm-svn: 155248
The <undef> flag on a def operand only applies to partial register
redefinitions. Only print the flag when relevant, and print it as
<def,read-undef> to make it clearer what it means.
llvm-svn: 155239
This nicely handles the most common case of virtual register sets, but
also handles anticipated cases where we will map pointers to IDs.
The goal is not to develop a completely generic SparseSet
template. Instead we want to handle the expected uses within llvm
without any template antics in the client code. I'm adding a bit of
template nastiness here, and some assumption about expected usage in
order to make the client code very clean.
The expected common uses cases I'm designing for:
- integer keys that need to be reindexed, and may map to additional
data
- densely numbered objects where we want pointer keys because no
number->object map exists.
llvm-svn: 155227
commits have had several major issues pointed out in review, and those
issues are not being addressed in a timely fashion. Furthermore, this
was all committed leading up to the v3.1 branch, and we don't need piles
of code with outstanding issues in the branch.
It is possible that not all of these commits were necessary to revert to
get us back to a green state, but I'm going to let the Hexagon
maintainer sort that out. They can recommit, in order, after addressing
the feedback.
Reverted commits, with some notes:
Primary commit r154616: HexagonPacketizer
- There are lots of review comments here. This is the primary reason
for reverting. In particular, it introduced large amount of warnings
due to a bad construct in tablegen.
- Follow-up commits that should be folded back into this when
reposting:
- r154622: CMake fixes
- r154660: Fix numerous build warnings in release builds.
- Please don't resubmit this until the three commits above are
included, and the issues in review addressed.
Primary commit r154695: Pass to replace transfer/copy ...
- Reverted to minimize merge conflicts. I'm not aware of specific
issues with this patch.
Primary commit r154703: New Value Jump.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154703: Remove iostream usage
- r154758: Fix CMake builds
- r154759: Fix build warnings in release builds
- Please incorporate these fixes and and review feedback before
resubmitting.
Primary commit r154829: Hexagon V5 (floating point) support.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154841: Remove unused variable (fixing build warnings)
There are also accompanying Clang commits that will be reverted for
consistency.
llvm-svn: 155047
transformation:
(X op C1) ^ C2 --> (X op C1) & ~C2 iff (C1&C2) == C2
should be done.
This change has been tested:
Using a debug+asserts build:
on the specific test case that brought this bug to light
make check-all
lnt nt
using this clang to build a release version of clang
Using the release+asserts clang-with-clang build:
on the specific test case that brought this bug to light
make check-all
lnt nt
Checking in because Evan wants it checked in. Test case forthcoming after
scrubbing.
llvm-svn: 154955
for the life of me remember why I wrote it this way, but I can't see any good
reason for it now. This patch replaces the custom linked list with an ilist.
This change should preserve the existing numberings exactly, so no generated code
should change (if it does, file a bug!).
llvm-svn: 154904
both fallthrough and a conditional branch target the same successor.
Gracefully delete the conditional branch and introduce any unconditional
branch needed to reach the actual successor. This fixes memory
corruption in 2009-06-15-RegScavengerAssert.ll and possibly other tests.
Also, while I'm here fix a latent bug I spotted by inspection. I never
applied the same fundamental fix to this fallthrough successor finding
logic that I did to the logic used when there are no conditional
branches. As a consequence it would have selected landing pads had they
be aligned in just the right way here. I don't have a test case as
I spotted this by inspection, and the previous time I found this
required have of TableGen's source code to produce it. =/ I hate backend
bugs. ;]
Thanks to Jim Grosbach for helping me reason through this and reviewing
the fix.
llvm-svn: 154867
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.
This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.
I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.
I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.
I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.
llvm-svn: 154816
rotation. When there is a loop backedge which is an unconditional
branch, we will end up with a branch somewhere no matter what. Try
placing this backedge in a fallthrough position above the loop header as
that will definitely remove at least one branch from the loop iteration,
where whole loop rotation may not.
I haven't seen any benchmarks where this is important but loop-blocks.ll
tests for it, and so this will be covered when I flip the default.
llvm-svn: 154812
laid out in a form with a fallthrough into the header and a fallthrough
out of the bottom. In that case, leave the loop alone because any
rotation will introduce unnecessary branches. If either side looks like
it will require an explicit branch, then the rotation won't add any, do
it to ensure the branch occurs outside of the loop (if possible) and
maximize the benefit of the fallthrough in the bottom.
llvm-svn: 154806
This is a complex change that resulted from a great deal of
experimentation with several different benchmarks. The one which proved
the most useful is included as a test case, but I don't know that it
captures all of the relevant changes, as I didn't have specific
regression tests for each, they were more the result of reasoning about
what the old algorithm would possibly do wrong. I'm also failing at the
moment to craft more targeted regression tests for these changes, if
anyone has ideas, it would be welcome.
The first big thing broken with the old algorithm is the idea that we
can take a basic block which has a loop-exiting successor and a looping
successor and use the looping successor as the layout top in order to
get that particular block to be the bottom of the loop after layout.
This happens to work in many cases, but not in all.
The second big thing broken was that we didn't try to select the exit
which fell into the nearest enclosing loop (to which we exit at all). As
a consequence, even if the rotation worked perfectly, it would result in
one of two bad layouts. Either the bottom of the loop would get
fallthrough, skipping across a nearer enclosing loop and thereby making
it discontiguous, or it would be forced to take an explicit jump over
the nearest enclosing loop to earch its successor. The point of the
rotation is to get fallthrough, so we need it to fallthrough to the
nearest loop it can.
The fix to the first issue is to actually layout the loop from the loop
header, and then rotate the loop such that the correct exiting edge can
be a fallthrough edge. This is actually much easier than I anticipated
because we can handle all the hard parts of finding a viable rotation
before we do the layout. We just store that, and then rotate after
layout is finished. No inner loops get split across the post-rotation
backedge because we check for them when selecting the rotation.
That fix exposed a latent problem with our exitting block selection --
we should allow the backedge to point into the middle of some inner-loop
chain as there is no real penalty to it, the whole point is that it
*won't* be a fallthrough edge. This may have blocked the rotation at all
in some cases, I have no idea and no test case as I've never seen it in
practice, it was just noticed by inspection.
Finally, all of these fixes, and studying the loops they produce,
highlighted another problem: in rotating loops like this, we sometimes
fail to align the destination of these backwards jumping edges. Fix this
by actually walking the backwards edges rather than relying on loopinfo.
This fixes regressions on heapsort if block placement is enabled as well
as lots of other cases where the previous logic would introduce an
abundance of unnecessary branches into the execution.
llvm-svn: 154783
This is a special flag for targets that really want their block
terminators in the DAG. The default scheduler cannot handle this
correctly, so it becomes the specialized scheduler's responsibility to
schedule terminators.
llvm-svn: 154712
- Don't copy offsets into HashData, the underlying vector won't change once the table is finalized.
- Allocate HashData and HashDataContents in a BumpPtrAllocator.
- Allocate string map entries in the same allocator.
- Random cleanups.
llvm-svn: 154694
Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.
llvm-svn: 154490
Allow cheap instructions to be hoisted if they are register pressure
neutral or better. This happens if the instruction is the last loop use
of another virtual register.
Only expensive instructions are allowed to increase loop register
pressure.
llvm-svn: 154455
Hoisting a value that is used by a PHI in the loop will introduce a
copy because the live range is extended to cross the PHI.
The same applies to PHIs in exit blocks.
Also use this opportunity to make HasLoopPHIUse() non-recursive.
llvm-svn: 154454
the loop header has a non-loop predecessor which has been pre-fused into
its chain due to unanalyzable branches. In this case, rotating the
header into the body of the loop in order to place a loop exit at the
bottom of the loop is a Very Bad Idea as it makes the loop
non-contiguous.
I'm working on a good test case for this, but it's a bit annoynig to
craft. I should get one shortly, but I'm submitting this now so I can
begin the (lengthy) performance analysis process. An initial run of LNT
looks really, really good, but there is too much noise there for me to
trust it much.
llvm-svn: 154395
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370
This patch restores TwoAddressInstructionPass's pre-r153892 behaviour when
rescheduling instructions in TryInstructionTransform. Hopefully this will fix
PR12493. To refix PR11861, lowering of INSERT_SUBREGS is deferred until after
the copy that unties the operands is emitted (this seems to be a more
appropriate fix for that issue anyway).
llvm-svn: 154338
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
llvm-svn: 154296
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.
No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.
This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.
llvm-svn: 154287
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
llvm-svn: 154119
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
When folding X == X we need to check getBooleanContents() to determine if the
result is a vector of ones or a vector of negative ones.
I tried creating a test case, but the problem seems to only be exposed on a
much older version of clang (around r144500).
rdar://10923049
llvm-svn: 153966
brace) so that we get more accurate line number information about the
declaration of a given function and the line where the function
first starts.
Part of rdar://11026482
llvm-svn: 153916
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
llvm-svn: 153864
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
llvm-svn: 153817
here but it has no other uses, then we have a problem. E.g.,
int foo (const int *x) {
char a[*x];
return 0;
}
If we assign 'a' a vreg and fast isel later on has to use the selection
DAG isel, it will want to copy the value to the vreg. However, there are
no uses, which goes counter to what selection DAG isel expects.
<rdar://problem/11134152>
llvm-svn: 153705
Some targets still mess up the liveness information, but that isn't
verified after MRI->invalidateLiveness().
The verifier can still check other useful things like register classes
and CFG, so it should be enabled after all passes.
llvm-svn: 153615
The late scheduler depends on accurate liveness information if it is
breaking anti-dependencies, so we should be able to verify it.
Relax the terminator checking in the machine code verifier so it can
handle the basic blocks created by if conversion.
llvm-svn: 153614
Extract the liveness verification into its own method.
This makes it possible to run the machine code verifier after liveness
information is no longer required to be valid.
llvm-svn: 153596
Branch folding can use a register scavenger to update liveness
information when required. Don't do that if liveness information is
already invalid.
llvm-svn: 153517
Late optimization passes like branch folding and tail duplication can
transform the machine code in a way that makes it expensive to keep the
register liveness information up to date. There is a fuzzy line between
register allocation and late scheduling where the liveness information
degrades.
The MRI::tracksLiveness() flag makes the line clear: While true,
liveness information is accurate, and can be used for register
scavenging. Once the flag is false, liveness information is not
accurate, and can only be used as a hint.
Late passes generally don't need the liveness information, but they will
sometimes use the register scavenger to help update it. The scavenger
enforces strict correctness, and we have to spend a lot of code to
update register liveness that may never be used.
llvm-svn: 153511
copies being considered for removal. Make sure to track all of the copies,
rather than just the most recent encountered, by holding a DenseSet instead of
an unsigned in SrcMap.
No test case - couldn't reduce something with a sane size.
llvm-svn: 153487
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716
llvm-svn: 153230
Type legalization can zero-extend the elements of the build_vector node, so,
for example, we may have an <8 x i8> with i32 elements of value 255. That
should return 'true' for the vector being all ones.
llvm-svn: 153203
i128). In that case, we may not be able to print out the MCExpr as an
expression. For instance, we could have an MCExpr like this:
0xBEEF0000BEEF0000 | (0xBEEF0000BEEF0000 << 64)
The MCExpr printer handles sizes up to 64-bits, but this expression would
require 128-bits. In this situation, try to evaluate the constant expression and
emit that as the value into 64-bit chunks.
<rdar://problem/11070338>
llvm-svn: 153081
a variable. The previous code would break the debug info changing
code invariant. This will regress debug info for arguments where
we elide the alloca created.
Fixes rdar://11066468
llvm-svn: 153074
These edges are not really necessary, but it is consistent with the
way we currently create physreg edges. Scheduler heuristics that
expect a DAG edge to the block terminator could benefit from this
change. Although in the future I hope we have a better mechanism for
modeling latency across scheduling regions.
llvm-svn: 152895
on our internal nightly testers. So, basically revert r152486 again.
Abbreviated original commit message:
Implement a more intelligent way of spilling uses across an invoke boundary.
It looks as if Chander's inlining work, r152737, exposed an issue.
llvm-svn: 152887
It caused MSP430DAGToDAGISel::SelectIndexedBinOp() to be miscompiled.
When two ReplaceUses()'s are expanded as inline, vtable in base class is stored to latter (ISelUpdater)ISU.
llvm-svn: 152877
out the DW_AT_name. Older gdbs unfortunately still use it to
disambiguate member functions in templated classes (gdb.cp/templates.exp).
rdar://11043421 (which is now deferred for a bit)
llvm-svn: 152782
There were cases where a value could be used and it's both crossing an invoke
and NOT crossing an invoke. This could happen in the landing pads. In that case,
we will demote the value to the stack like we did before.
<rdar://problem/10609139>
llvm-svn: 152705
New flags: -misched-topdown, -misched-bottomup. They can be used with
the default scheduler or with -misched=shuffle. Without either
topdown/bottomup flag -misched=shuffle now alternates scheduling
direction.
LiveIntervals update is unimplemented with bottom-up scheduling, so
only -misched-topdown currently works.
Capped the ScheduleDAG hierarchy with a concrete ScheduleDAGMI class.
ScheduleDAGMI is aware of the top and bottom of the unscheduled zone
within the current region. Scheduling policy can be plugged into
the ScheduleDAGMI driver by implementing MachineSchedStrategy.
ConvergingScheduler is now the default scheduling algorithm.
It exercises the new driver but still does no reordering.
llvm-svn: 152700
output (we're emitting a specification already and the information
isn't changing).
Saves 1% on the debug information for a build of llvm.
Fixes rdar://11043421
llvm-svn: 152697
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
llvm-svn: 152675
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
llvm-svn: 152532
The old way of determine when and where to spill a value that was used inside of
a landing pad resulted in spilling that value everywhere and not just at the
invoke edge.
This algorithm determines which values are used within a landing pad. It then
spills those values before the invoke and reloads them before the uses. This
should prevent excessive spilling in many cases, e.g. inside of loops.
<rdar://problem/10609139>
llvm-svn: 152486
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
llvm-svn: 152297
Allow targets to provide their own schedulers (subclass of
ScheduleDAGInstrs) to the misched pass. Select schedulers using
-misched=...
llvm-svn: 152278
ScheduleDAGInstrs will be the main interface for MI-level
schedulers. Make sure it's readable: one page of protected fields, one
page of public methids.
llvm-svn: 152258
This one is particularly annoying because the hashing algorithm is
highly specialized, with a strange "equivalence" definition that subsets
the fields involved.
Still, this looks at the exact same set of data as the old code, but
without bitwise or-ing over parts of it and other mixing badness. No
functionality changed here. I've left a substantial fixme about the fact
that there is a cleaner and more principled way to do this, but it
requires making the equality definition actual stable for particular
types...
llvm-svn: 152218
the DebugLoc information can be maintained throughout by grabbing the DebugLoc
before the RemoveBranch and then passing the result to the InsertBranch.
Patch by Andrew Stanford-Jason!
llvm-svn: 152212
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
The first def of a virtual register cannot also read the register.
Assert on such bad machine code instead of trying to fix it.
TwoAddressInstructionPass should never create code like that.
llvm-svn: 152010
The inline table needs to be constructed ahead of time so that it doesn't try to
create new strings while we're emitting everything.
This reverts commit a8ff9bccb399183cdd5f1c3cec2bda763664b4b0.
llvm-svn: 151864
fixups that are being used to determine section offsets. Reduces
the total number of fixups by 50% for a non-trivial testcase.
Part of rdar://10413936
llvm-svn: 151852
Simply treat bundles as instructions. Spill code is inserted between
bundles, never inside a bundle. Rewrite all operands in a bundle at
once.
Don't attempt and memory operand folding inside bundles.
llvm-svn: 151787
This allows the function to be inlined, and makes it suitable for use in
getInstructionIndex().
Also provide a const version. C++ is great for touch typing practice.
llvm-svn: 151782
This function does more or less the same as
MI::readsWritesVirtualRegister(), but it supports bundles as well.
It also determines if any constraint requires reading and writing
operands to use the same register. Most clients want to know.
Use the more modern MO.readsReg() instead of trying to sort out undefs
and partial redefines. Stop supporting the extra full <imp-def> operand
as an alternative to <def,undef> sub-register defines.
llvm-svn: 151690
Extract a base class and provide four specific sub-classes for iterating
over const/non-const bundles/instructions.
This eliminates the mystery bool constructor argument.
llvm-svn: 151684
methods are no longer needed now that LinearScan has gone away.
(Contains tweaks trivialSpillEverywhere to enable the removal of getNewVRegs).
llvm-svn: 151658
To avoid problems with zero shifts when getting the bits that move between words
we use a trick: first shift the by amount-1, then do another shift by one. When
amount is 0 (and size 32) we first shift by 31, then by one, instead of by 32.
Also fix a latent bug that emitted the low and high words in the wrong order
when shifting right.
Fixes PR12113.
llvm-svn: 151637
When the GEP index is a vector of pointers, the code that calculated the size
of the element started from the vector type, and not the contained pointer type.
As a result, instead of looking at the data element pointed by the vector, this
code used the size of the vector. This works for 32bit members (on 32bit
systems), but not for other types. Added code to peel the vector type and
added a test.
llvm-svn: 151626
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
After the SlotIndex slot names were updated, it is possible to apply
stricter checks to live intervals.
Also treat bundles as bags of operands when checking live intervals.
llvm-svn: 151531
uses of the vreg, since the old kills may no longer be valid. This was causing
-verify-machineinstrs to complain about uses after kills, and could potentially
have been causing subtle register allocation issues, but I haven't come across a
test case yet.
llvm-svn: 151425
variable declaration as an argument because we want that address
anyhow for our debug information.
This seems to fix rdar://9965111, at least we have more debug
information than before and from reading the assembly it appears
to be the correct location.
llvm-svn: 151335
Assuming that a single std::set node adds 3 control words, a bitvector
can store (3*8+4)*8=224 registers in the allocated memory of a single
element in the std::set (x86_64). Also we don't have to call malloc
for every register added.
llvm-svn: 151269
Before register allocation, instructions can be moved across calls in
order to reduce register pressure. After register allocation, we don't
gain a lot by moving callee-saved defs across calls. In fact, since the
scheduler doesn't have a good idea how registers are used in the callee,
it can't really make good scheduling decisions.
This changes the schedule in two ways: 1. Latencies to call uses and
defs are no longer accounted for, causing some random shuffling around
calls. This isn't really a problem since those uses and defs are
inaccurate proxies for what happens inside the callee. They don't
represent registers used by the call instruction itself.
2. Instructions are no longer moved across calls. This didn't happen
very often, and the scheduling decision was made on dubious information
anyway.
As with any scheduling change, benchmark numbers shift around a bit,
but there is no positive or negative trend from this change.
This makes the post-ra scheduler 5% faster for ARM targets.
The secret motivation for this patch is the introduction of register
mask operands representing call clobbers. The most efficient way of
handling regmasks in ScheduleDAGInstrs is to model them as barriers for
physreg live ranges, but not for virtreg live ranges. That's fine
pre-ra, but post-ra it would have the same effect as this patch.
llvm-svn: 151265
The standard function epilog includes a .size directive, but ppc64 uses
an alternate local symbol to tag the actual start of each function.
Until recently, binutils accepted the .size directive as:
.size test1, .Ltmp0-test1
however, using this directive with recent binutils will result in the error:
.size expression for XXX does not evaluate to a constant
so we must use the label which actually tags the start of the function.
llvm-svn: 151200
Affect on SD scheduling and postRA scheduling:
Printing the DAG will display the nodes in top-down topological order.
This matches the order within the MBB and makes my life much easier in general.
Affect on misched:
We don't need to track virtual register uses at all. This is awesome.
I also intend to rely on the SUnit ID as a topo-sort index. So if A < B then we cannot have an edge B -> A.
llvm-svn: 151135
This makes RAFast 4% faster, and it gets rid of the dodgy DenseMap
iteration.
This also revealed that RAFast would sometimes dereference DenseMap
iterators after erasing other elements from the map. That does seem to
work in the current DenseMap implementation, but SparseSet doesn't allow
it.
llvm-svn: 151111
bundles. This method takes a bundle start and an MI being bundled, and makes
the intervals for the MI's operands appear to start/end on the bundle start.
Also fixes some minor cosmetic issues (whitespace, naming convention) in the
HMEditor code.
llvm-svn: 151099
Passes after RegAlloc should be able to rely on MRI->getNumVirtRegs() == 0.
This makes sharing code for pre/postRA passes more robust.
Now, to check if a pass is running before the RA pipeline begins, use MRI->isSSA().
To check if a pass is running after the RA pipeline ends, use !MRI->getNumVirtRegs().
PEI resets virtual regs when it's done scavenging.
PTX will either have to provide its own PEI pass or assign physregs.
llvm-svn: 151032
ecx = mov eax
al = mov ch
The second copy is not a nop because the sub-indices of ecx,ch is not the
same of that of eax/al.
Re-enabled machine-cp.
PR11940
llvm-svn: 151002
MRI keeps track of which physregs have been used. Make sure it gets
updated with all the regmask-clobbered registers.
Delete the closePhysRegsUsed() function which isn't necessary.
llvm-svn: 150830
any changes.
Internally this adds a private inner class HMEditor, to LiveIntervals. HMEditor provides
an API for updating live intervals when code is moved or bundled.
llvm-svn: 150826
This caused miscompilations on out-of-tree targets, and possibly i386 as
well.
I'll find some other way of hoisting %rip-relative loads from loops
containing calls.
llvm-svn: 150816
The existing framework for postra scheduling is library local. We want to keep it that way. Soon we will have a more general MachineScheduler interface. At that time, various bits will be exposed to targets. In the meantime, the VLIWPacketizer wants to use ScheduleDAGInstrs directly, so it needs to wrapped in a PIMPL to avoid exposing it to the target interface.
llvm-svn: 150633
method. This allows the target lowering code to not have to deal with MDNodes.
Also, avoid leaking memory like a sieve by not creating a global variable for
the image info section, but just emitting the code directly.
llvm-svn: 150624
The llc command line options for enabling/disabling passes are local to CodeGen/Passes.cpp. This patch associates those options with standard pass IDs so they work regardless of how the target configures the passes.
A target has two ways of overriding standard passes:
1) Redefine the pass pipeline (override TargetPassConfig::add%Stage)
2) Replace or suppress individiual passes with TargetPassConfig::substitutePass.
In both cases, the command line options associated with the pass override the target default.
For example, say a target wants to disable machine instruction scheduling by default:
- The target calls disablePass(MachineSchedulerID) but otherwise does not override any TargetPassConfig methods.
- Without any llc options, no scheduler is run.
- With -enable-misched, the standard machine scheduler is run and honors the -misched=... flag to select the scheduler variant, which may be used for performance evaluation or testing.
Sorry overridePass is ugly. I haven't thought of a better way without replacing the cl::opt framework. I hope to do that one day...
I haven't figured out why CodeGen uses char& for pass IDs. AnalysisID is much easier to use and less bug prone. I'm using it wherever I can for internal implementation. Maybe later we can change the global pass ID definitions as well.
llvm-svn: 150563
Pretend that regmask interference ends at the 'dead' slot, even when
there is other interference ending at the 'reg' slot of the same
instruction.
llvm-svn: 150531
Only accept register masks when looking for an 'overlapping' def. When
Overlap is not set, the function searches for a proper definition of
Reg.
This means MI->modifiesRegister() considers register masks, but
MI->definesRegister() doesn't.
llvm-svn: 150529
When a physreg is live in to a basic block, look for any instruction in
the block that clobbers the physreg.
The instruction doesn't have to properly redefine the register, any
overlapping clobber is OK.
This slightly changes live ranges when compiling with register masks.
llvm-svn: 150528
The MachO back-end needs to emit the garbage collection flags specified in the
module flags. This is a WIP, so the front-end hasn't been modified to emit these
flags just yet. Documentation and front-end switching to occur soon.
llvm-svn: 150507
that are greater than the vector element type. For example BUILD_VECTOR
of type <1 x i1> with a constant i8 operand.
This patch fixes the assertion.
llvm-svn: 150477
The scheduler will sometimes check the implicit-def list on instructions
to properly handle pre-colored DAG edges.
Also check any register mask operands for physreg clobbers.
llvm-svn: 150428
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.
In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.
This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.
llvm-svn: 150340
In case the MachineScheduling pass I'm working on doesn't work well
for another target, they can completely override it. This also adds a
hook immediately after the RegAlloc pass to cleanup immediately after
vregs go away. We may want to fold it into the postRA hook later.
llvm-svn: 150298
When using register masks, registers like %rip are clobbered by the
register mask. LICM should still be able to hoist instructions reading
%rip from a loop containing calls.
llvm-svn: 150288
It can be necessary to detach a register mask pointer from its
MachineOperand. This method is convenient for checking clobbered
physregs on a detached bitmask pointer.
llvm-svn: 150261
This makes global live range splitting behave identically with and
without register mask operands.
This is not necessarily the best way of using register masks for live
range splitting. It would be more efficient to first split global live
ranges around calls (i.e., register masks), and reserve the fine grained
per-physreg interference guidance for global live ranges that do not
cross calls.
For now the goal is to produce identical assembly when enabling register
masks.
llvm-svn: 150259
Make them accessible through MCInstrInfo. They are only used for debugging purposes so this doesn't
have an impact on performance. X86MCTargetDesc.o goes from 630K to 461K on x86_64.
llvm-svn: 150245
Creates a configurable regalloc pipeline.
Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa.
When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>.
CodeGen transformation passes are never "required" as an analysis
ProcessImplicitDefs does not require LiveVariables.
We have a plan to massively simplify some of the early passes within the regalloc superpass.
llvm-svn: 150226
This only adds the interference checks required for correctness.
We still need to take advantage of register masks for the
interference driven live range splitting.
llvm-svn: 150191
Failure to preserve kills was causing LiveIntervals to miss some EFLAGS live
ranges. Unfortunately I've been unable to reduce a good test case yet.
llvm-svn: 150152
I think this was already the intention, but DeadMachineInstructionElim
was accidentally tracking the liveness of reserved registers. Now,
instructions with reserved defs are never deleted.
This prevents the call stack adjustment instructions from getting
deleted when enabling register masks.
llvm-svn: 150116
For simplicity, treat calls with register masks as basic block
boundaries. This means we can't copy propagate callee-saved registers
across calls, but I don't think that is a big deal.
llvm-svn: 150108
Moving toward a uniform style of pass definition to allow easier target configuration.
Globally declare Pass ID.
Globally declare pass initializer.
Use INITIALIZE_PASS consistently.
Add a call to the initializer from CodeGen.cpp.
Remove redundant "createPass" functions and "getPassName" methods.
While cleaning up declarations, cleaned up comments (sorry for large diff).
llvm-svn: 150100
Build an ordered vector of register mask operands (i.e., calls) when
computing live intervals. Provide a checkRegMaskInterference() function
that computes a bit mask of usable registers for a live range.
This is a quick way of determining of a live range crosses any calls,
and restricting it to the callee saved registers if it does.
Previously, we had to discover call clobbers for each candidate register
independently.
llvm-svn: 150077
but with a critical fix to the SelectionDAG code that optimizes copies
from strings into immediate stores: the previous code was stopping reading
string data at the first nul. Address this by adding a new argument to
llvm::getConstantStringInfo, preserving the behavior before the patch.
llvm-svn: 149800
A live range that has an early clobber tied redef now looks like a
normal tied redef, except the early clobber def uses the early clobber
slot.
This is enough to handle any strange interference problems.
llvm-svn: 149769
I don't have a test that fails because of this, but a test case like
CodeGen/X86/2009-12-01-EarlyClobberBug.ll exposes the problem. EAX is
redefined by a tied early clobber operand on inline asm, and the live
range should look like this:
%EAX,inf = [48r,64e:0)[64e,80r:1) 0@48r 1@64e
Previously, the two values got merged:
%EAX,inf = [48r,80r:0) 0@48r
With this bug fixed, the REDEF_BY_EC VNInfo flag is no longer needed.
llvm-svn: 149768
Andy, in a previous commit you made this into an ImmutablePass so that you could
add it to the PassManager, then in the next commit you left it a Pass but
removed the code that added it to the PM. If you do add it to the PM then the PM
should take care of deleting it, but it's also true that nothing in codegen
needs this object to exist after it's done its work here. It's not clear to me
which design you want; this should likely either cease to be a Pass or be added
to the PM where other parts of CodeGen will request it.
llvm-svn: 149765
If a value is defined by a COPY, that instuction can easily and cheaply
be found by getInstructionFromIndex(VNI->def).
This reduces the size of VNInfo from 24 to 16 bytes, and improves
llc compile time by 3%.
llvm-svn: 149763
Passes prior to instructon selection are now split into separate configurable stages.
Header dependencies are simplified.
The bulk of this diff is simply removal of the silly DisableVerify flags.
Sorry for the target header churn. Attempting to stabilize them.
llvm-svn: 149754
Calls that use register mask operands don't have implicit defs for
returned values. The register mask operand handles the call clobber,
but it always behaves like a set of dead defs.
Add live implicit defs for any implicitly defined physregs that are
actually used.
llvm-svn: 149715
SelectionDAG has 4 different ways of passing physreg defs to users.
Collect all of the uses at the same time, and pass all of them to
MI->setPhysRegsDeadExcept() to mark the remaining defs dead.
The setPhysRegsDeadExcept() function will soon add the required
implicit-defs to instructions with register mask operands.
llvm-svn: 149708
In this patch we optimize this pattern and convert the sequence into extract op of a narrow type.
This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases.
llvm-svn: 149692
Allows command line overrides to be centralized in LLVMTargetMachine.cpp.
LLVMTargetMachine can intercept common passes and give precedence to command line overrides.
Allows adding "internal" target configuration options without touching TargetOptions.
Encapsulates the PassManager.
Provides a good point to initialize all CodeGen passes so that Pass ID's can be used in APIs.
Allows modifying the target configuration hooks without rebuilding the world.
llvm-svn: 149672
needed to emit a 64-bit gp-relative relocation entry. Make changes necessary
for emitting jump tables which have entries with directive .gpdword. This patch
does not implement the parts needed for direct object emission or JIT.
llvm-svn: 149668
more than two adjacent ranges needed to be merged. The new version should be
able to handle an arbitrary sequence of adjancent ranges.
llvm-svn: 149588
This new scheduler plugs into the existing selection DAG scheduling framework. It is a top-down critical path scheduler that tracks register pressure and uses a DFA for pipeline modeling.
Patch by Sergei Larin!
llvm-svn: 149547
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
llvm-svn: 149481
This removes implicit assumption about the form of MI coming into regalloc. In particular, it should be independent of ProcessImplicitDefs which will eventually become a standard part of coming out of SSA--unless we simply can eliminate IMPLICIT_DEF completely. Current unit tests expose this once I remove incidental pass ordering restrictions.
This is not a final fix. Just a temporary workaround until I figure out the right way.
llvm-svn: 149360
vectors of all one bits to be printed more cleverly in the AsmPrinter.
Unfortunately, the byte value for all one bits is the same with
-fsigned-char as the error return of '-1'. Force this to be the unsigned
byte value when returning it to avoid this problem, and update the test
case for the shiny new behavior.
Yay for building LLVM and Clang with -funsigned-char.
Chris, please review, and let me know if there is any reason to not
desire this change. It seems good on the surface, and certainly intended
based on the code written.
llvm-svn: 149299
- Don't call malloc+free in the very hot forward().
- Don't call isTiedToDefOperand().
- Don't create BitVector temporaries.
- Merge DeadRegs into KillRegs.
- Eliminate the early clobber checks, they were irrelevant to scavenging.
- Remove unnecessary code from -Asserts builds.
This speeds up ARM PEI by 3.4x and overall llc -O0 codegen time by 11%.
llvm-svn: 149189
Sometimes there is only one 'resume' instruction per function. In those
situations, we don't need a separate block for the call to _Unwind_Resume. In
fact, it adds a lot of overhead to code-gen if we do that -- especially at -O0.
If we have a single 'resume' instruction, just generate the call within that
block.
<rdar://problem/10694814>
llvm-svn: 149159
around within a basic block while maintaining live-intervals.
Updated ScheduleTopDownLive in MachineScheduler.cpp to use the moveInstr API
when reordering MIs.
llvm-svn: 149147
GEP instructions are there for the compiler and shouldn't really output much
code (if any at all). When a GEP is stored in the entry block, Fast ISel (for
one) will not know that it could fold it into further uses. For instance, inside
of the EH handling code. This results in a lot of unnecessary spills and loads
which bloat code and slows down pretty much everything.
<rdar://problem/10694814>
llvm-svn: 149114
mid-level constant folding APIs instead of doing its own analysis.
This makes it more general (e.g. can now share a <2 x i64> with a
<4 x i32>) and avoid duplicating a bunch of logic.
llvm-svn: 149111
we're at it, allow PatternMatch's "neg" pattern to match integer
vector negations, and enhance ComputeNumSigned bits to handle
shl of vectors.
llvm-svn: 149082
MachineBasicBlock::canFallThrough(). We're interested in the state of the
instruction (i.e., is this a barrier or not?), not if the instruction is
predicable or not.
rdar://10501092
llvm-svn: 149070
The live range of the source register may be extended when a redundant
copy is eliminated. Make sure any kill flags between the two copies are
cleared.
This fixes PR11765.
llvm-svn: 149069
This enables the linker to match concrete relocation types (absolute or relative) with whatever library or C++ support code is being linked against.
llvm-svn: 149057
This boils down to using MachineOperand::readsReg() more.
This fixes PR11829 where a use ended up after the first def when
lowering REG_SEQUENCE instructions involving IMPLICIT_DEFs.
llvm-svn: 148996
function. They don't appear to be used, and are inconsistent with handling of
other physreg intervals (i.e. intervals that are not live-in) where ranges are
not inserted for aliases.
llvm-svn: 148986
A REG_SEQUENCE instruction is lowered into a sequence of partial defs:
%vreg7:ssub_0<def,undef> = COPY %vreg20:ssub_0
%vreg7:ssub_1<def> = COPY %vreg2
%vreg7:ssub_2<def> = COPY %vreg2
%vreg7:ssub_3<def> = COPY %vreg2
The first def needs an <undef> flag to indicate it is the beginning of
the live range, while the other defs are read-modify-write. Previously,
we depended on LiveIntervalAnalysis to notice and fix the missing
<def,undef>, but that solution was never robust, it was causing problems
with ProcessImplicitDefs and the lowering of chained REG_SEQUENCE
instructions.
This fixes PR11841.
llvm-svn: 148879
This change adds an new option --arm-enable-ehabi-descriptors that
enables emitting unwinding descriptors. This provides a mode with a
working backtrace() without the (currently broken) exception support.
llvm-svn: 148800
and clean up some other misc stuff. Unlike ConstantArray, we will
prefer to emit .fill directives for "String" arrays that all have
the same value, since they are denser than emitting a .ascii
llvm-svn: 148793
violation -- MC cannot depend on CodeGen.
Specifically, the MCTargetDesc component of each target is actually
a subcomponent of the MC library. As such, it cannot depend on the
target-independent code generator, because MC itself cannot depend on
the target-independent code generator. This change moved a flag from the
ARM MCTargetDesc file ARMMCAsmInfo.cpp to the CodeGen layer in
ARMException.cpp, leaving behind an 'extern' to refer back to it. That
layering order isn't viable givin the constraints outlined above.
Commandline flags are designed to be static specifically to avoid these
types of bugs.
Fixing this is likely going to require some non-trivial refactoring.
llvm-svn: 148759
This change adds an new value to the --arm-enable-ehabi option that
disables emitting unwinding descriptors. This mode gives a working
backtrace() without the (currently broken) exception support.
llvm-svn: 148686
A register mask operand kills any live physreg that isn't preserved.
Unlike an implicit-def operand, the clobbered physregs are never live
afterwards.
This means LiveVariables has to track a much smaller number of live
physregs, and it should spend much less time in addRegisterDead().
llvm-svn: 148609
Problem: LLVM needs more function attributes than currently available (32 bits).
One such proposed attribute is "address_safety", which shows that a function is being checked for address safety (by AddressSanitizer, SAFECode, etc).
Solution:
- extend the Attributes from 32 bits to 64-bits
- wrap the object into a class so that unsigned is never erroneously used instead
- change "unsigned" to "Attributes" throughout the code, including one place in clang.
- the class has no "operator uint64 ()", but it has "uint64_t Raw() " to support packing/unpacking.
- the class has "safe operator bool()" to support the common idiom: if (Attributes attr = getAttrs()) useAttrs(attr);
- The CTOR from uint64_t is marked explicit, so I had to add a few explicit CTOR calls
- Add the new attribute "address_safety". Doing it in the same commit to check that attributes beyond first 32 bits actually work.
- Some of the functions from the Attribute namespace are worth moving inside the class, but I'd prefer to have it as a separate commit.
Tested:
"make check" on Linux (32-bit and 64-bit) and Mac (10.6)
built/run spec CPU 2006 on Linux with clang -O2.
This change will break clang build in lib/CodeGen/CGCall.cpp.
The following patch will fix it.
llvm-svn: 148553
'insertvalue' instructions that recreate the structure returned by the
'landingpad' instruction. Because the 'insertvalue' instruction isn't supported
by FastISel, this can save a bit of time during -O0 compilation.
llvm-svn: 148520
to instruction right after the last instruction in the bundle.
- Add a finalizeBundle() variant that doesn't specify LastMI. Instead, the code
will find the last instruction in the bundle by following the 'InsideBundle'
marker. This is useful in case bundles are formed early (i.e. during MI
scheduling) but finalized later (i.e. after register allocator has finished
rewriting virtual registers with physical registers).
llvm-svn: 148444
This SelectionDAG node will be attached to call nodes by LowerCall(),
and eventually becomes a MO_RegisterMask MachineOperand on the
MachineInstr representing the call instruction.
LowerCall() will attach a register mask that depends on the calling
convention.
llvm-svn: 148436
(This time I believe I've checked all the -Wreturn-type warnings from GCC & added the couple of llvm_unreachables necessary to silence them. If I've missed any, I'll happily fix them as soon as I know about them)
llvm-svn: 148262
Register masks will be used as a compact representation of large clobber
lists. Currently, an x86 call instruction has some 40 operands
representing call-clobbered registers. That's more than 1kB of useless
operands per call site.
A register mask operand references a bit mask of call-preserved
registers, everything else is clobbered. The bit mask will typically
come from TargetRegisterInfo::getCallPreservedMask().
By abandoning ImplicitDefs for call-clobbered registers, it also becomes
possible to share call instruction descriptions between calling
conventions, and we can get rid of the WINCALL* instructions.
This patch introduces the new operand kind. Future patches will add
RegMask support to target-independent passes before finally the fixed
clobber lists can be removed from call instruction descriptions.
llvm-svn: 148250
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
ldr r0, [r2]
ldr r1, [r2, #4]
=>
ldr r0, [r2], #4
ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.
rdar://10674430
llvm-svn: 148086
The registers are placed into the saved registers list in the reverse order,
which is why the original loop was written to loop backwards.
llvm-svn: 148064
killed registers are needed below the insertion point, then unset the kill
marker.
Sorry I'm not able to find a reduced test case.
rdar://10660944
llvm-svn: 148043
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.
llvm-svn: 147964
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
Consider this code:
int h() {
int x;
try {
x = f();
g();
} catch (...) {
return x+1;
}
return x;
}
The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.
SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.
Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().
<rdar://problem/10664933>
llvm-svn: 147912
Delete the alternative implementation in LiveIntervalAnalysis.
These functions computed the same thing, but SplitAnalysis caches the
result.
llvm-svn: 147911
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
llvm-svn: 147861
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.
rdar://10660865
llvm-svn: 147827
Reserved registers don't have proper live ranges, their LiveInterval
simply has a snippet of liveness for each def. Virtual registers with a
single value that is a copy of a reserved register (typically %esp) can
be coalesced with the reserved register if the live range doesn't
overlap any reserved register defs.
When coalescing with a reserved register, don't modify the reserved
register live range. Just leave it as a bunch of dead defs. This
eliminates quadratic coalescer behavior in i386 functions with many
function calls.
PR11699
llvm-svn: 147726
up so branch folding pass can't use the scavenger. :-( This doesn't breaks
anything currently. It just means targets which do not carefully update kill
markers cannot run post-ra scheduler (not new, it has always been the case).
We should fix this at some point since it's really hacky.
llvm-svn: 147719
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
movl %eax, %ecx
movl %ecx, %eax
ret
The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)
This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.
rdar://10428165
rdar://10640363
llvm-svn: 147716
the debug type accelerator tables to contain the tag and a flag
stating whether or not a compound type is a complete type.
rdar://10652330
llvm-svn: 147651
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.
llvm-svn: 147604
The register allocators don't currently support adding reserved
registers while they are running. Extend the MRI API to keep track of
the set of reserved registers when register allocation started.
Target hooks like hasFP() and needsStackRealignment() can look at this
set to avoid reserving more registers during register allocation.
llvm-svn: 147577
Before we'd get:
$ clang t.c
fatal error: error in backend: Invalid operand for inline asm constraint 'i'!
Now we get:
$ clang t.c
t.c:16:5: error: invalid operand for inline asm constraint 'i'!
"movq (%4), %%mm0\n"
^
Which at least gets us the inline asm that is the problem.
llvm-svn: 147502
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147445
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147399
unpredicated. That is, turn
subeq r0, r1, #1
addne r0, r1, #1
into
sub r0, r1, #1
addne r0, r1, #1
For targets where conditional instructions are always executed, this may be
beneficial. It may remove pseudo anti-dependency in out-of-order execution
CPUs. e.g.
op r1, ...
str r1, [r10] ; end-of-life of r1 as div result
cmp r0, #65
movne r1, #44 ; raw dependency on previous r1
moveq r1, #12
If movne is unpredicated, then
op r1, ...
str r1, [r10]
cmp r0, #65
mov r1, #44 ; r1 written unconditionally
moveq r1, #12
Both mov and moveq are no longer depdendent on the first instruction. This gives
the out-of-order execution engine more freedom to reorder them.
This has passed entire LLVM test suite. But it has not been enabled for any ARM
variant pending more performance evaluation.
rdar://8951196
llvm-svn: 146914
Now that getMatchingSuperRegClass() returns accurate results, it can be
used to compute constraints imposed by instructions using a sub-register
of a virtual register.
This means we can recompute the register class of any virtual register
by combining the constraints from all its uses.
llvm-svn: 146874
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
llvm-svn: 146610
On ARM, peephole optimization for ABS creates a trivial cfg triangle which tempts machine sink to sink instructions in code which is really straight line code. Sometimes this sinking may alter register allocator input such that use and def of a reg is divided by a branch in between, which may result in extra spills. Now mahine sink avoids sinking if final sink destination is post dominator.
Radar 10266272.
llvm-svn: 146604
r0 = mov #0
r0 = moveq #1
Then the second instruction has an implicit data dependency on the first
instruction. Sadly I have yet to come up with a small test case that
demonstrate the post-ra scheduler taking advantage of this.
llvm-svn: 146583
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
Fast ISel isn't able to handle 'insertvalue' and it causes a large slowdown
during -O0 compilation. We don't necessarily need to generate an aggregate of
the values here if they're just going to be extracted directly afterwards.
<rdar://problem/10530851>
llvm-svn: 146481
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
If we create new intervals for a variable that is being spilled, then those new intervals are not guaranteed to also spill. This means that anything reading from the original spilling value might not get the correct value if spills were missed.
Fixes <rdar://problem/10546864>
llvm-svn: 146428
clients to decide whether to look inside bundled instructions and whether
the query should return true if any / all bundled instructions have the
queried property.
llvm-svn: 146168
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
llvm-svn: 146150
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
This flag is used when bundling machine instructions. It indicates
whether the operand reads a value defined inside or outside its bundle.
llvm-svn: 145997
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
The new register allocator is much more able to split back up ranges too constrained by register classes.
Fixes <rdar://problem/10466609>
llvm-svn: 145899
This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly
documented as taking log2(bytes) units, but the x86 target would still
set a preferred loop alignment of '16'.
CodePlacementOpt passed this number on to the basic block, and
AsmPrinter interpreted it as bytes.
Now both MachineFunction and MachineBasicBlock use logarithmic
alignments.
Obviously, MachineConstantPool still measures alignments in bytes, so we
can emulate the thrill of using as.
llvm-svn: 145889
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
non_lazy_symbol_pointers section (__IMPORT,__pointers). Ignore the 'hidden' part
since that will place it in the wrong section.
<rdar://problem/10443720>
llvm-svn: 145356
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
llvm-svn: 145300
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
llvm-svn: 145273
fallthrough) in cases where we might fail to rotate an exit to an outer
loop onto the end of the loop chain.
Having *some* rotation, but not performing this rotation, is the primary
fix of thep performance regression with -enable-block-placement for
Olden/em3d (a whopping 30% regression). Still working on reducing the
test case that actually exercises this and the new rotation strategy out
of this code, but I want to check if this regresses other test cases
first as that may indicate it isn't the correct fix.
llvm-svn: 145195
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.
The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.
This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.
llvm-svn: 145183
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.
The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.
llvm-svn: 145179
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.
This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.
llvm-svn: 145158
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.
Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.
llvm-svn: 145120
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.
I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.
llvm-svn: 145119
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.
This was found during a bootstrap with block placement turned on.
llvm-svn: 145100
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.
llvm-svn: 145098
This was a bug in keeping track of the available domains when merging
domain values.
The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.
Also add an assertion to catch future attempts at emitting AVX2
instructions.
llvm-svn: 145096
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.
Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.
llvm-svn: 145094
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.
This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.
This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.
llvm-svn: 145062
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
llvm-svn: 145060
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.
The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.
This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]
llvm-svn: 145010
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.
Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.
Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).
llvm-svn: 144994
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD. Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to
coalesce ADDs.
rdar://10412592
llvm-svn: 144886
for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector. This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.
llvm-svn: 144832
and code model. This eliminates the need to pass OptLevel flag all over the
place and makes it possible for any codegen pass to use this information.
llvm-svn: 144788
There may be many invokes that share one landing pad, and the previous code
would record the landing pad once for each invoke. Besides the wasted
effort, a pair of volatile loads gets inserted every time the landing pad is
processed. The rest of the code can get optimized away when a landing pad
is processed repeatedly, but the volatile loads remain, resulting in code like:
LBB35_18:
Ltmp483:
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r4, [r7, #-72]
ldr r2, [r7, #-68]
llvm-svn: 144787
This same basic code was in the older version of the SjLj exception handling,
but it was removed in the recent revisions to that code. It needs to be there.
llvm-svn: 144782
%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134
Prior to this commit, the GEP instruction that defines %arrayidx136 thought that
%arrayidx135 was a trivial kill. The GEP that defines %arrayidx135 doesn't
generate any code and thus %M0 gets folded into the second GEP. Thus, we need
to look through GEPs with all zero indices.
rdar://10443319
llvm-svn: 144730
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).
Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.
llvm-svn: 144674