violation -- MC cannot depend on CodeGen.
Specifically, the MCTargetDesc component of each target is actually
a subcomponent of the MC library. As such, it cannot depend on the
target-independent code generator, because MC itself cannot depend on
the target-independent code generator. This change moved a flag from the
ARM MCTargetDesc file ARMMCAsmInfo.cpp to the CodeGen layer in
ARMException.cpp, leaving behind an 'extern' to refer back to it. That
layering order isn't viable givin the constraints outlined above.
Commandline flags are designed to be static specifically to avoid these
types of bugs.
Fixing this is likely going to require some non-trivial refactoring.
llvm-svn: 148759
This change adds an new value to the --arm-enable-ehabi option that
disables emitting unwinding descriptors. This mode gives a working
backtrace() without the (currently broken) exception support.
llvm-svn: 148686
A register mask operand kills any live physreg that isn't preserved.
Unlike an implicit-def operand, the clobbered physregs are never live
afterwards.
This means LiveVariables has to track a much smaller number of live
physregs, and it should spend much less time in addRegisterDead().
llvm-svn: 148609
Problem: LLVM needs more function attributes than currently available (32 bits).
One such proposed attribute is "address_safety", which shows that a function is being checked for address safety (by AddressSanitizer, SAFECode, etc).
Solution:
- extend the Attributes from 32 bits to 64-bits
- wrap the object into a class so that unsigned is never erroneously used instead
- change "unsigned" to "Attributes" throughout the code, including one place in clang.
- the class has no "operator uint64 ()", but it has "uint64_t Raw() " to support packing/unpacking.
- the class has "safe operator bool()" to support the common idiom: if (Attributes attr = getAttrs()) useAttrs(attr);
- The CTOR from uint64_t is marked explicit, so I had to add a few explicit CTOR calls
- Add the new attribute "address_safety". Doing it in the same commit to check that attributes beyond first 32 bits actually work.
- Some of the functions from the Attribute namespace are worth moving inside the class, but I'd prefer to have it as a separate commit.
Tested:
"make check" on Linux (32-bit and 64-bit) and Mac (10.6)
built/run spec CPU 2006 on Linux with clang -O2.
This change will break clang build in lib/CodeGen/CGCall.cpp.
The following patch will fix it.
llvm-svn: 148553
'insertvalue' instructions that recreate the structure returned by the
'landingpad' instruction. Because the 'insertvalue' instruction isn't supported
by FastISel, this can save a bit of time during -O0 compilation.
llvm-svn: 148520
to instruction right after the last instruction in the bundle.
- Add a finalizeBundle() variant that doesn't specify LastMI. Instead, the code
will find the last instruction in the bundle by following the 'InsideBundle'
marker. This is useful in case bundles are formed early (i.e. during MI
scheduling) but finalized later (i.e. after register allocator has finished
rewriting virtual registers with physical registers).
llvm-svn: 148444
This SelectionDAG node will be attached to call nodes by LowerCall(),
and eventually becomes a MO_RegisterMask MachineOperand on the
MachineInstr representing the call instruction.
LowerCall() will attach a register mask that depends on the calling
convention.
llvm-svn: 148436
(This time I believe I've checked all the -Wreturn-type warnings from GCC & added the couple of llvm_unreachables necessary to silence them. If I've missed any, I'll happily fix them as soon as I know about them)
llvm-svn: 148262
Register masks will be used as a compact representation of large clobber
lists. Currently, an x86 call instruction has some 40 operands
representing call-clobbered registers. That's more than 1kB of useless
operands per call site.
A register mask operand references a bit mask of call-preserved
registers, everything else is clobbered. The bit mask will typically
come from TargetRegisterInfo::getCallPreservedMask().
By abandoning ImplicitDefs for call-clobbered registers, it also becomes
possible to share call instruction descriptions between calling
conventions, and we can get rid of the WINCALL* instructions.
This patch introduces the new operand kind. Future patches will add
RegMask support to target-independent passes before finally the fixed
clobber lists can be removed from call instruction descriptions.
llvm-svn: 148250
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
ldr r0, [r2]
ldr r1, [r2, #4]
=>
ldr r0, [r2], #4
ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.
rdar://10674430
llvm-svn: 148086
The registers are placed into the saved registers list in the reverse order,
which is why the original loop was written to loop backwards.
llvm-svn: 148064
killed registers are needed below the insertion point, then unset the kill
marker.
Sorry I'm not able to find a reduced test case.
rdar://10660944
llvm-svn: 148043
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.
llvm-svn: 147964
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
Consider this code:
int h() {
int x;
try {
x = f();
g();
} catch (...) {
return x+1;
}
return x;
}
The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.
SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.
Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().
<rdar://problem/10664933>
llvm-svn: 147912
Delete the alternative implementation in LiveIntervalAnalysis.
These functions computed the same thing, but SplitAnalysis caches the
result.
llvm-svn: 147911
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
llvm-svn: 147861
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.
rdar://10660865
llvm-svn: 147827
Reserved registers don't have proper live ranges, their LiveInterval
simply has a snippet of liveness for each def. Virtual registers with a
single value that is a copy of a reserved register (typically %esp) can
be coalesced with the reserved register if the live range doesn't
overlap any reserved register defs.
When coalescing with a reserved register, don't modify the reserved
register live range. Just leave it as a bunch of dead defs. This
eliminates quadratic coalescer behavior in i386 functions with many
function calls.
PR11699
llvm-svn: 147726
up so branch folding pass can't use the scavenger. :-( This doesn't breaks
anything currently. It just means targets which do not carefully update kill
markers cannot run post-ra scheduler (not new, it has always been the case).
We should fix this at some point since it's really hacky.
llvm-svn: 147719
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
movl %eax, %ecx
movl %ecx, %eax
ret
The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)
This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.
rdar://10428165
rdar://10640363
llvm-svn: 147716
the debug type accelerator tables to contain the tag and a flag
stating whether or not a compound type is a complete type.
rdar://10652330
llvm-svn: 147651
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.
llvm-svn: 147604
The register allocators don't currently support adding reserved
registers while they are running. Extend the MRI API to keep track of
the set of reserved registers when register allocation started.
Target hooks like hasFP() and needsStackRealignment() can look at this
set to avoid reserving more registers during register allocation.
llvm-svn: 147577
Before we'd get:
$ clang t.c
fatal error: error in backend: Invalid operand for inline asm constraint 'i'!
Now we get:
$ clang t.c
t.c:16:5: error: invalid operand for inline asm constraint 'i'!
"movq (%4), %%mm0\n"
^
Which at least gets us the inline asm that is the problem.
llvm-svn: 147502
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147445
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147399
unpredicated. That is, turn
subeq r0, r1, #1
addne r0, r1, #1
into
sub r0, r1, #1
addne r0, r1, #1
For targets where conditional instructions are always executed, this may be
beneficial. It may remove pseudo anti-dependency in out-of-order execution
CPUs. e.g.
op r1, ...
str r1, [r10] ; end-of-life of r1 as div result
cmp r0, #65
movne r1, #44 ; raw dependency on previous r1
moveq r1, #12
If movne is unpredicated, then
op r1, ...
str r1, [r10]
cmp r0, #65
mov r1, #44 ; r1 written unconditionally
moveq r1, #12
Both mov and moveq are no longer depdendent on the first instruction. This gives
the out-of-order execution engine more freedom to reorder them.
This has passed entire LLVM test suite. But it has not been enabled for any ARM
variant pending more performance evaluation.
rdar://8951196
llvm-svn: 146914
Now that getMatchingSuperRegClass() returns accurate results, it can be
used to compute constraints imposed by instructions using a sub-register
of a virtual register.
This means we can recompute the register class of any virtual register
by combining the constraints from all its uses.
llvm-svn: 146874
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
llvm-svn: 146610
On ARM, peephole optimization for ABS creates a trivial cfg triangle which tempts machine sink to sink instructions in code which is really straight line code. Sometimes this sinking may alter register allocator input such that use and def of a reg is divided by a branch in between, which may result in extra spills. Now mahine sink avoids sinking if final sink destination is post dominator.
Radar 10266272.
llvm-svn: 146604
r0 = mov #0
r0 = moveq #1
Then the second instruction has an implicit data dependency on the first
instruction. Sadly I have yet to come up with a small test case that
demonstrate the post-ra scheduler taking advantage of this.
llvm-svn: 146583
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
Fast ISel isn't able to handle 'insertvalue' and it causes a large slowdown
during -O0 compilation. We don't necessarily need to generate an aggregate of
the values here if they're just going to be extracted directly afterwards.
<rdar://problem/10530851>
llvm-svn: 146481
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
If we create new intervals for a variable that is being spilled, then those new intervals are not guaranteed to also spill. This means that anything reading from the original spilling value might not get the correct value if spills were missed.
Fixes <rdar://problem/10546864>
llvm-svn: 146428
clients to decide whether to look inside bundled instructions and whether
the query should return true if any / all bundled instructions have the
queried property.
llvm-svn: 146168
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
llvm-svn: 146150
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
This flag is used when bundling machine instructions. It indicates
whether the operand reads a value defined inside or outside its bundle.
llvm-svn: 145997
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
The new register allocator is much more able to split back up ranges too constrained by register classes.
Fixes <rdar://problem/10466609>
llvm-svn: 145899
This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly
documented as taking log2(bytes) units, but the x86 target would still
set a preferred loop alignment of '16'.
CodePlacementOpt passed this number on to the basic block, and
AsmPrinter interpreted it as bytes.
Now both MachineFunction and MachineBasicBlock use logarithmic
alignments.
Obviously, MachineConstantPool still measures alignments in bytes, so we
can emulate the thrill of using as.
llvm-svn: 145889
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
non_lazy_symbol_pointers section (__IMPORT,__pointers). Ignore the 'hidden' part
since that will place it in the wrong section.
<rdar://problem/10443720>
llvm-svn: 145356
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
llvm-svn: 145300
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
llvm-svn: 145273
fallthrough) in cases where we might fail to rotate an exit to an outer
loop onto the end of the loop chain.
Having *some* rotation, but not performing this rotation, is the primary
fix of thep performance regression with -enable-block-placement for
Olden/em3d (a whopping 30% regression). Still working on reducing the
test case that actually exercises this and the new rotation strategy out
of this code, but I want to check if this regresses other test cases
first as that may indicate it isn't the correct fix.
llvm-svn: 145195
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.
The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.
This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.
llvm-svn: 145183
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.
The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.
llvm-svn: 145179
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.
This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.
llvm-svn: 145158
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.
Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.
llvm-svn: 145120
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.
I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.
llvm-svn: 145119
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.
This was found during a bootstrap with block placement turned on.
llvm-svn: 145100
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.
llvm-svn: 145098
This was a bug in keeping track of the available domains when merging
domain values.
The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.
Also add an assertion to catch future attempts at emitting AVX2
instructions.
llvm-svn: 145096
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.
Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.
llvm-svn: 145094
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.
This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.
This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.
llvm-svn: 145062
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
llvm-svn: 145060
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.
The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.
This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]
llvm-svn: 145010
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.
Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.
Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).
llvm-svn: 144994
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD. Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to
coalesce ADDs.
rdar://10412592
llvm-svn: 144886
for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector. This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.
llvm-svn: 144832
and code model. This eliminates the need to pass OptLevel flag all over the
place and makes it possible for any codegen pass to use this information.
llvm-svn: 144788
There may be many invokes that share one landing pad, and the previous code
would record the landing pad once for each invoke. Besides the wasted
effort, a pair of volatile loads gets inserted every time the landing pad is
processed. The rest of the code can get optimized away when a landing pad
is processed repeatedly, but the volatile loads remain, resulting in code like:
LBB35_18:
Ltmp483:
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r2, [r7, #-72]
ldr r2, [r7, #-68]
ldr r4, [r7, #-72]
ldr r2, [r7, #-68]
llvm-svn: 144787
This same basic code was in the older version of the SjLj exception handling,
but it was removed in the recent revisions to that code. It needs to be there.
llvm-svn: 144782
%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134
Prior to this commit, the GEP instruction that defines %arrayidx136 thought that
%arrayidx135 was a trivial kill. The GEP that defines %arrayidx135 doesn't
generate any code and thus %M0 gets folded into the second GEP. Thus, we need
to look through GEPs with all zero indices.
rdar://10443319
llvm-svn: 144730
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).
Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.
llvm-svn: 144674
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.
The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.
llvm-svn: 144627
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.
The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.
The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.
llvm-svn: 144602
Keep track of the last instruction to define each register individually
instead of per DomainValue. This lets us track more accurately when a
register was last written.
Also track register ages across basic blocks. When entering a new
basic block, use the least stale predecessor def as a worst case
estimate for register age.
The register age is used to arbitrate between conflicting domains. The
most recently defined register wins.
llvm-svn: 144601
"kill". This looks like a bug upstream. Since that's going to take some time
to understand, loosen the assertion and disable the optimization when
multiple kills are seen.
llvm-svn: 144568
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688
llvm-svn: 144559
cleans up all the chains allocated during the processing of each
function so that for very large inputs we don't just grow memory usage
without bound.
llvm-svn: 144533
tests when I forcibly enabled block placement.
It is apparantly possible for an unanalyzable block to fallthrough to
a non-loop block. I don't actually beleive this is correct, I believe
that 'canFallThrough' is returning true needlessly for the code
construct, and I've left a bit of a FIXME on the verification code to
try to track down why this is coming up.
Anyways, removing the assert doesn't degrade the correctness of the algorithm.
llvm-svn: 144532
this pass. We're leaving already merged blocks on the worklist, and
scanning them again and again only to determine each time through that
indeed they aren't viable. We can instead remove them once we're going
to have to scan the worklist. This is the easy way to implement removing
them. If this remains on the profile (as I somewhat suspect it will), we
can get a lot more clever here, as the worklist's order is essentially
irrelevant. We can use swapping and fold the two loops to reduce
overhead even when there are many blocks on the worklist but only a few
of them are removed.
llvm-svn: 144531
time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.
I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.
llvm-svn: 144530
correctly handle blocks whose successor weights sum to more than
UINT32_MAX. This is slightly less efficient, but the entire thing is
already linear on the number of successors. Calling it within any hot
routine is a mistake, and indeed no one is calling it. It also
simplifies the code.
llvm-svn: 144527
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.
I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.
The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.
llvm-svn: 144526
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.
The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.
Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.
llvm-svn: 144516
This makes no difference for normal defs, but early clobber dead defs
now look like:
[Slot_EarlyClobber; Slot_Dead)
instead of:
[Slot_EarlyClobber; Slot_Register).
Live ranges for normal dead defs look like:
[Slot_Register; Slot_Dead)
as before.
llvm-svn: 144512
when we fail to place all the blocks of a loop. Currently this is
happening for unnatural loops, and this logic helps more immediately
point to the problem.
llvm-svn: 144504
The old naming scheme (load/use/def/store) can be traced back to an old
linear scan article, but the names don't match how slots are actually
used.
The load and store slots are not needed after the deferred spill code
insertion framework was deleted.
The use and def slots don't make any sense because we are using
half-open intervals as is customary in C code, but the names suggest
closed intervals. In reality, these slots were used to distinguish
early-clobber defs from normal defs.
The new naming scheme also has 4 slots, but the names match how the
slots are really used. This is a purely mechanical renaming, but some
of the code makes a lot more sense now.
llvm-svn: 144503
branches that also may involve fallthrough. In the case of blocks with
no fallthrough, we can still re-order the blocks profitably. For example
instruction decoding will in some cases continue past an indirect jump,
making laying out its most likely successor there profitable.
Note, no test case. I don't know how to write a test case that exercises
this logic, but it matches the described desired semantics in
discussions with Jakob and others. If anyone has a nice example of IR
that will trigger this, that would be lovely.
Also note, there are still assertion failures in real world code with
this. I'm digging into those next, now that I know this isn't the cause.
llvm-svn: 144499
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.
The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.
The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.
I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.
llvm-svn: 144495
It was off by default.
The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.
llvm-svn: 144481
It is worth noting that the old spiller would split live ranges around
basic blocks. The new spiller doesn't do that.
PBQP should do its own live range splitting with
SplitEditor::splitSingleBlock() if desired. See
RAGreedy::tryBlockSplit().
llvm-svn: 144476
RegAllocGreedy has been the default for six months now.
Deleting RegAllocLinearScan makes it possible to also delete
VirtRegRewriter and clean up the spiller code.
llvm-svn: 144475
instance and a concrete inlined instance are the use of DW_TAG_subprogram
instead of DW_TAG_inlined_subroutine and the who owns the tree.
We were also omitting DW_AT_inline from the abstract roots. To fix this,
make sure we mark abstract instance roots with DW_AT_inline even when
we have only out-of-line instances referring to them with DW_AT_abstract_origin.
FileCheck is not a very good tool for tests like this, maybe we should add
a -verify mode to llvm-dwarfdump.
llvm-svn: 144441
instruction lower optimization" in the pre-RA scheduler.
The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.
Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.
llvm-svn: 144267
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction. This is not what we want when those implicit
operands refer to the register being spilled.
Implicit operands referring to other registers are preserved.
This fixes PR11347.
llvm-svn: 144247
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages). Original commit message:
Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
llvm-svn: 144188
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.
After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.
This will properly fix execution domains on software pipelined code,
like the included test case.
llvm-svn: 144151
When merging two uncollapsed DomainValues, place a link to the active
DomainValue from the passive DomainValue. This allows old stale
references to the passive DomainValue to be updated to point to the
active DomainValue.
The new resolve() function finds the active DomainValue and updates the
pointer.
This change makes old live-out lists more useful since they may contain
uncollapsed DomainValues that have since been merged into other
DomainValues.
llvm-svn: 144149
This new function will decrement the reference count, and collapse a
domain value when the last reference is gone.
This simplifies DomainValue reference counting, and decouples it from
the LiveRegs array.
llvm-svn: 144131
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
llvm-svn: 144124
The old value may still be referenced by some live-out list, and we
don't wan't to collapse those instructions twice.
This fixes the "Can only swizzle VMOVD" assertion in some armv7 SPEC
builds.
<rdar://problem/10413292>
llvm-svn: 144117
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.
llvm-svn: 144102
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses
llvm-svn: 144100
DomainValues that are only used by "don't care" instructions are now
collapsed to the first possible execution domain after all basic blocks
have been processed. This typically means the PS domain on x86.
For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are
completely collapsed to the PS domain instead of containing a mix of
execution domains created by isel.
llvm-svn: 144037
The enterBasicBlock() function is combining live-out values from
predecessor blocks. The RPO traversal means that more predecessors
have been visited when that happens, only back-edges are missing.
llvm-svn: 144025
the pubnames and pubtypes tables. LLDB can currently use this format
and a full spec is forthcoming and submission for standardization is planned.
A basic summary:
The dwarf accelerator tables are an indirect hash table optimized
for null lookup rather than access to known data. They are output into
an on-disk format that looks like this:
.-------------.
| HEADER |
|-------------|
| BUCKETS |
|-------------|
| HASHES |
|-------------|
| OFFSETS |
|-------------|
| DATA |
`-------------'
where the header contains a magic number, version, type of hash function,
the number of buckets, total number of hashes, and room for a special
struct of data and the length of that struct.
The buckets contain an index (e.g. 6) into the hashes array. The hashes
section contains all of the 32-bit hash values in contiguous memory, and
the offsets contain the offset into the data area for the particular
hash.
For a lookup example, we could hash a function name and take it modulo the
number of buckets giving us our bucket. From there we take the bucket value
as an index into the hashes table and look at each successive hash as long
as the hash value is still the same modulo result (bucket value) as earlier.
If we have a match we look at that same entry in the offsets table and
grab the offset in the data for our final match.
llvm-svn: 143921
the mailing list. Suggestions for other statistics to collect would be
awesome. =]
Currently these are implemented as a separate pass guarded by a separate
flag. I'm not thrilled by that, but I wanted to be able to collect the
statistics for the old code placement as well as the new in order to
have a point of comparison. I'm planning on folding them into the single
pass if / when there is only one pass of interest.
llvm-svn: 143537
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.
llvm-svn: 143206
Don't assume APInt::getRawData() would hold target-aware endianness nor host-compliant endianness. rawdata[0] holds most lower i64, even on big endian host.
FIXME: Add a testcase for big endian target.
FIXME: Ditto on CompileUnit::addConstantFPValue() ?
llvm-svn: 143194
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
Delete #if 0 code accidentally left in.
llvm-svn: 143188
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
llvm-svn: 143177
trying to legalize the operand types when only the result type
is required to be legalized - the type legalization machinery
will get round to the operands later if they need legalizing.
There can be a point to legalizing operands in parallel with
the result: when this saves compile time or results in better
code. There was only one case in which this was true: when
the operand is also split, so keep the logic for that bit.
As a result of this change, additional operand legalization
methods may need to be introduced to handle nodes where the
result and operand types can differ, like SIGN_EXTEND, but
the testsuite doesn't contain any tests where this is the case.
In any case, it seems better to require such methods (and die
with an assert if they doesn't exist) than to quietly produce
wrong code if we forgot to special case the node in
SplitVecRes_UnaryOp.
llvm-svn: 143026
This code makes different decisions when compiled into x87 instructions
because of different rounding behavior. That caused phase 2/3
miscompares on 32-bit Linux when the phase 1 compiler was built with gcc
(using x87), and the phase 2 compiler was built with clang (using SSE).
This fixes PR11200.
llvm-svn: 143006
An MBB which branches to an EH landing pad shouldn't be considered for tail merging.
In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>
llvm-svn: 143001
down to this commit. Original commit message:
An MBB which branches to an EH landing pad shouldn't be considered for tail merging.
In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>
llvm-svn: 142920
In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>
llvm-svn: 142891
discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.
The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.
As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.
Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.
The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.
llvm-svn: 142743
The assumption in the back-end is that PHIs are not allowed at the start of the
landing pad block for SjLj exceptions.
<rdar://problem/10313708>
llvm-svn: 142689
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.
SetCC return type needs to be legalized via PromoteTargetBoolean.
llvm-svn: 142660
it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.
This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.
Test is updated to include both basic loop alignment and nested loop
alignment.
llvm-svn: 142645
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:
1) It operates on the Machine-IR in the CodeGen layer. This exposes much
more (and more precise) information and opportunities. Also, the
results are more stable due to fewer transforms ocurring after the
pass runs.
2) It uses the generalized probability and frequency analyses. These can
model static heuristics, code annotation derived heuristics as well
as eventual profile loading. By basing the optimization on the
analysis interface it can work from any (or a combination) of these
inputs.
3) It uses a more aggressive algorithm, both building chains from tho
bottom up to maximize benefit, and using an SCC-based walk to layout
chains of blocks in a profitable ordering without O(N^2) iterations
which the old pass involves.
The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.
Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D
llvm-svn: 142641
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.
llvm-svn: 142542
svn r139159 caused SelectionDAG::getConstant() to promote BUILD_VECTOR operands
with illegal types, even before type legalization. For this testcase, that led
to one BUILD_VECTOR with i16 operands and another with promoted i32 operands,
which triggered the assertion.
llvm-svn: 142370
.file filenumber "directory" "filename"
This removes one join+split of the directory+filename in MC internals. Because
bitcode files have independent fields for directory and filenames in debug info,
this patch may change the .o files written by existing .bc files.
llvm-svn: 142300
Use the custom inserter for the ARM setjmp intrinsics. Instead of creating the
SjLj dispatch table in IR, where it frequently violates serveral assumptions --
in particular assumptions made by the landingpad instruction about what can
branch to a landing pad and what cannot. Performing this in the back-end allows
us to violate these assumptions without the IR getting angry at us.
It also allows us to perform a small optimization. We can shove the address of
the dispatch's basic block into the function context and not have to add code
around the setjmp to check for the return value and jump to the dispatch.
Neat, huh?
<rdar://problem/10116753>
llvm-svn: 142294
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
This isn't put into the 'clear()' method because the information needs to stick
around (at least for a little bit) after the selection DAG is built.
llvm-svn: 142032
When spilling around an instruction with a dead def, remember to add a
value number for the def.
The missing value number wouldn't normally create problems since there
would be an incoming live range as well. However, due to another bug
we could spill a dead V_SET0 instruction which doesn't read any values.
The missing value number caused an empty live range to be created which
is dangerous since it doesn't interfere with anything.
This fixes part of PR11125.
llvm-svn: 141923
Now that MI->getRegClassConstraint() can also handle inline assembly,
don't bail when recomputing the register class of a virtual register
used by inline asm.
This fixes PR11078.
llvm-svn: 141836
Most instructions have some requirements for their register operands.
Usually, this is expressed as register class constraints in the
MCInstrDesc, but for inline assembly the constraints are encoded in the
flag words.
llvm-svn: 141835
The inline asm operand constraint is initially encoded in the virtual
register for the operand, but that register class may change during
coalescing, and the original constraint is lost.
Encode the original register class as part of the flag word for each
inline asm operand. This makes it possible to recover the actual
constraint required by inline asm, just like we can for normal
instructions.
llvm-svn: 141833
our current machine instruction defines a register with the same register class
as what's being replaced. This showed up in the SPEC 403.gcc benchmark, where it
would ICE because a tail call was expecting one register class but was given
another. (The machine instruction verifier catches this situation.)
<rdar://problem/10270968>
llvm-svn: 141830
rather than the previous index. If a block has a single instruction, the
previous index may be in a different basic block.
I have no clue how this used to work on all of test-suite, because now this
failure is seen quite often when trying to compile code with -strong-phi-elim.
This fixes PR10252.
llvm-svn: 141812
containing loop's header to see if that's a landing pad. If it is, then we don't
want to hoist instructions out of the loop and above the header.
llvm-svn: 141767
1. The speculation check may not have been performed if the BB hasn't had a load
LICM candidate.
2. If the candidate would be CSE'ed, then go ahead and speculatively LICM the
instruction even if it's in high register pressure situation.
llvm-svn: 141747
file. Since it should only be used when necessary propagate it through
the backend code generation and tweak testcases accordingly.
This helps with code like in clang's test/CodeGen/debug-info-line.c where
we have multiple #line directives within a single lexical block and want
to generate only a single block that contains each file change.
Part of rdar://10246360
llvm-svn: 141729
The blocks with invokes have branches to the dispatch block, because that more
correctly models the behavior of the CFG. The dispatch of course has edges to
the landing pads. Those landing pads could contain invokes, which then have
branches back to the dispatch. This creates a loop. The machine LICM pass looks
at this loop and thinks it can hoist elements out of it. But because the
dispatch is an alternate entry point into the program, the hoisted instructions
won't be executed.
I wasn't able to get a testcase which was small and could reproduce all of the
time. The function_try_block.cpp in llvm-test was where this showed up.
llvm-svn: 141726
Allow targets to expand COPY and other standard pseudo-instructions
before they are expanded with copyPhysReg().
This allows the target to examine the COPY instruction for extra
operands indicating it can be widened to a preferable super-register
copy. See the ARM -widen-vmovs option.
llvm-svn: 141578
PhysReg operands are not allowed to have sub-register indices at all.
For virtual registers with sub-reg indices, check that all registers in
the register class support the sub-reg index.
llvm-svn: 141220
EXTRACT_SUBREG is emitted as %dst = COPY %src:sub, so there is no need to
constrain the %dst register class. RegisterCoalescer will apply the
necessary constraints if it decides to eliminate the COPY.
The %src register class does need to be constrained to something with
the right sub-registers, though. This is currently done manually with
COPY_TO_REGCLASS nodes. They can possibly be removed after this patch.
llvm-svn: 141207
The register class created by INSERT_SUBREG and SUBREG_TO_REG must be
legal and support the SubIdx sub-registers.
The new getSubClassWithSubReg() hook can compute that.
This may create INSERT_SUBREG instructions defining a larger register
class than the sub-register being inserted. That is OK,
RegisterCoalescer will constrain the register class as needed when it
eliminates the INSERT_SUBREG instructions.
llvm-svn: 141198
TwoAddressInstructionPass should annotate instructions with <undef>
flags when it lower REG_SEQUENCE instructions. LiveIntervals should not
be in the business of modifying code (except for kill flags, perhaps).
llvm-svn: 141187
For example:
%vreg10:dsub_0<def,undef> = COPY %vreg1
%vreg10:dsub_1<def> = COPY %vreg2
is rewritten as:
%D2<def> = COPY %D0, %Q1<imp-def>
%D3<def> = COPY %D1, %Q1<imp-use,kill>, %Q1<imp-def>
The first COPY doesn't care about the previous value of %Q1, so it
doesn't read that register.
The second COPY is a partial redefinition of %Q1, so it implicitly kills
and redefines that register.
This makes it possible to recognize instructions that can harmlessly
clobber the full super-register. The write and don't read the
super-register.
llvm-svn: 141139
RegisterCoalescer can create sub-register defs when it is joining a
register with a sub-register. Add <undef> flags to these new
sub-register defs where appropriate.
llvm-svn: 141138
The <undef> flag says that a MachineOperand doesn't read its register,
or doesn't depend on the previous value of its register.
A full register def never depends on the previous register value. A
partial register def may depend on the previous value if it is intended
to update part of a register.
For example:
%vreg10:dsub_0<def,undef> = COPY %vreg1
%vreg10:dsub_1<def> = COPY %vreg2
The first copy instruction defines the full %vreg10 register with the
bits not covered by dsub_0 defined as <undef>. It is not considered a
read of %vreg10.
The second copy modifies part of %vreg10 while preserving the rest. It
has an implicit read of %vreg10.
This patch adds a MachineOperand::readsReg() method to determine if an
operand reads its register.
Previously, this was modelled by adding a full-register <imp-def>
operand to the instruction. This approach makes it possible to
determine directly from a MachineOperand if it reads its register. No
scanning of MI operands is required.
llvm-svn: 141124
and the alignment is 0 (i.e., it's defined globally in one file and declared in
another file) it could get an alignment which is larger than the ABI allows for
that type, resulting in aligned moves being used for unaligned loads.
For instance, in file A.c:
struct S s;
In file B.c:
struct {
// something long
};
extern S s;
void foo() {
struct S p = s;
// ...
}
this copy is a 'memcpy' which is turned into a series of 'movaps' instructions
on X86. But this is wrong, because 'struct S' has alignment of 4, not 16.
llvm-svn: 140902
This helps with porting code from 2.9 to 3.0 as TargetSelect.h changed location,
and if you include the old one by accident you will trigger this assert.
llvm-svn: 140848
The function needs to scan the implicit operands anyway, so no
performance is won by caching the number of implicit operands added to
an instruction.
This also fixes a bug when adding operands after an implicit operand has
been added manually. The NumImplicitOps count wasn't kept up to date.
MachineInstr::addOperand() will now consistently place all explicit
operands before all the implicit operands, regardless of the order they
are added. It is possible to change an MI opcode and add additional
explicit operands. They will be inserted before any existing implicit
operands.
The only exception is inline asm instructions where operands are never
reordered. This is because of a hack that marks explicit clobber regs
on inline asm as <implicit-def> to please the fast register allocator.
This hack can go away when InstrEmitter and FastIsel can add exact
<dead> flags to physreg defs.
llvm-svn: 140744
Upon further review, most of the EH code should remain written at the IR
level. The part which breaks SSA form is the dispatch table, so that part will
be moved to the back-end.
llvm-svn: 140730
This intrinsic is used to pass the index of the function context to the back-end
for further processing. The back-end is in charge of filling in the rest of the
entries.
llvm-svn: 140676
The DWARF exception pass uses the call site information, which is set up here. A
pre-RA pass is too late for it to use this information. So create and setup the
function context here, and then insert the call site values here (and map the
call sites for the DWARF EH pass). This is simpler than the original pass, and
doesn't make the CFG lose its SSA-ness.
It's a win-win-win-win-lose-win-win situation.
llvm-svn: 140675
current IR-level pass.
The old SjLj EH pass has some problems, especially with the new EH model. Most
significantly, it violates some of the new restrictions the new model has. For
instance, the 'dispatch' table wants to jump to the landing pad, but we cannot
allow that because only an invoke's unwind edge can jump to a landing pad. This
requires us to mangle the code something awful. In addition, we need to keep the
now dead landingpad instructions around instead of CSE'ing them because the
DWARF emitter uses that information (they are dead because no control flow edge
will execute them - the control flow edge from an invoke's unwind is superceded
by the edge coming from the dispatch).
Basically, this pass belongs not at the IR level where SSA is king, but at the
code-gen level, where we have more flexibility.
llvm-svn: 140646
Many targets use pseudo instructions to help register allocation. Like
the COPY instruction, these pseudos can be expanded after register
allocation. The early expansion can make life easier for PEI and the
post-ra scheduler.
This patch adds a hook that is called for all remaining pseudo
instructions from the ExpandPostRAPseudos pass.
llvm-svn: 140472
SDNodes may return values which are wider than the incoming element types. In
this patch we fix the integer promotion of these nodes.
Fixes spill-q.ll when running -promote-elements.
llvm-svn: 140471
I'll fix the file contents in the next commit.
This pass is currently expanding the COPY and SUBREG_TO_REG pseudos. I
am going to add a hook so targets can expand more pseudo-instructions
after register allocation.
Many targets have pseudo-instructions that assist the register
allocator. They can be expanded after register allocation, before PEI
and PostRA scheduling.
llvm-svn: 140469
(this is always the case for scalars), otherwise use the promoted result type.
Fix test/CodeGen/X86/vsplit-and.ll when promote-elements is enabled.
llvm-svn: 140464
When generating the trunc-store of i1's, we need to use the vector type and not
the scalar type.
This patch fixes the assertion in CodeGen/Generic/bool-vector.ll when
running with -promote-elements.
llvm-svn: 140463
DecomposeMERGE_VALUES to "know" that results are legalized in
a particular order, by passing it the number of the result
being legalized (the type legalization core provides this, it
just needs to be passed on).
llvm-svn: 140373
integer-promotion of CONCAT_VECTORS.
Test: test/CodeGen/X86/widen_shuffle-1.ll
This patch fixes the above tests (when running in with -promote-elements).
llvm-svn: 140372
Sometimes register class constraints are trivial, like GR32->GR32_NOSP,
or GPR->rGPR. Teach InstrEmitter to simply constrain the virtual
register instead of emitting a copy in these cases.
Normally, these copies are handled by the coalescer. This saves some
coalescer work.
llvm-svn: 140340
The function will refuse to use a register class with fewer registers
than MinNumRegs. This can be used by clients to avoid accidentally
increase register pressure too much.
The default value of MinNumRegs=0 doesn't affect how constrainRegClass()
works.
llvm-svn: 140339
Few weeks ago, llvm completely inverted the debug info graph. Earlier each debug info node used to keep track of its compile unit, now compile unit keeps track of important nodes. One impact of this change is that the global variable's do not have any context, which should be checked before deciding to use AT_specification DIE.
llvm-svn: 140282
This is still a hack until we can teach tblgen to generate the
optional CPSR operand rather than an implicit CPSR def. But the
strangeness is now limited to the selection DAG. ADD/SUB MI's no
longer have implicit CPSR defs, nor do we allow flag setting variants
of these opcodes in machine code. There are several corner cases to
consider, and getting one wrong would previously lead to nasty
miscompilation. It's not the first time I've debugged one, so this
time I added enough verification to ensure it won't happen again.
llvm-svn: 140228
No functionality change. The hook makes it explicit which patterns
require "special" handling. i.e. it self-documents tblgen
deficiencies. I plan to add verification in ExpandISelPseudos and
Thumb2SizeReduce to catch any missing hasPostISelHooks. Otherwise it's
too fragile.
llvm-svn: 140160
Modified ARMISelLowering::AdjustInstrPostInstrSelection to handle the
full gamut of CPSR defs/uses including instructins whose "optional"
cc_out operand is not really optional. This allowed removal of the
hasPostISelHook to simplify the .td files and make the implementation
more robust.
Fixes rdar://10137436: sqlite3 miscompile
llvm-svn: 140134
The leaveIntvAfter() function normally inserts a back-copy after the
requested instruction, making the back-copy kill the live range.
In spill mode, try to insert the back-copy before the last use instead.
That means the last use becomes the kill instead of the back-copy. This
lowers the register pressure because the last use can now redefine the
same register it was reading.
This will also improve compile time: The back-copy isn't a kill, so
hoisting it in hoistCopiesForSize() won't force a recomputation of the
source live range. Similarly, if the back-copy isn't hoisted by the
splitter, the spiller will not attempt hoisting it locally.
llvm-svn: 139883
If the source register is live after the copy being spilled, there is no
point to hoisting it. Hoisting inside a basic block only serves to
resolve interferences by shortening the live range of the source.
llvm-svn: 139882
When -split-spill-mode is enabled, spill hoisting is performed by
SplitKit instead of by InlineSpiller. This hidden command line option
is for testing the splitter spill mode.
llvm-svn: 139845
When traceSiblingValue() encounters a PHI-def value created by live
range splitting, don't look at all the predecessor blocks. That can be
very expensive in a complicated CFG.
Instead, consider that all the non-PHI defs jointly dominate all the
PHI-defs. Tracing directly to all the non-PHI defs is much faster that
zipping around in the CFG when there are many PHIs with many
predecessors.
This significantly improves compile time for indirectbr interpreters.
llvm-svn: 139797
Blocks with multiple PHI successors only need to go on the worklist
once. Use a SmallPtrSet to track the live-out blocks that have already
been handled. This is a lot faster than the two live range check we
would otherwise do.
Also stop recomputing hasPHIKill flags. Like RenumberValues(), it is
conservatively correct to leave them in, and they are not used for
anything important.
llvm-svn: 139792
It does, after all.
RemoveCopyByCommutingDef rewrites the uses of one particular value
number in A. It doesn't know how to rewrite phi uses, so there can't be
any.
llvm-svn: 139787
There is only one legitimate use remaining, in addIntervalsForSpills().
All other calls to hasPHIKill() are only used to update PHIKill flags.
The addIntervalsForSpills() function is part of the old spilling
framework, only used by linearscan.
llvm-svn: 139783
Instead, let HasOtherReachingDefs() test for defs in B that overlap any
phi-defs in A as well. This test is slightly different, but almost
identical.
A perfectly precise test would only check those phi-defs in A that are
reachable from AValNo.
llvm-svn: 139782
The source live range is recomputed using shrinkToUses() which does
handle phis correctly. The hasPHIKill() condition was relevant in the
old days when ReMaterializeTrivialDef() tried to recompute the live
range itself.
The shrinkToUses() function will mark the original def as dead when no
more uses and phi kills remain. It is then removed by
runOnMachineFunction().
llvm-svn: 139781
It is conservatively correct to keep the hasPHIKill flags, even after
deleting PHI-defs.
The calculation can be very expensive after taildup has created a
quadratic number of indirectbr edges in the CFG, and the hasPHIKill flag
isn't used for anything after RenumberValues().
llvm-svn: 139780
THe LRE_DidCloneVirtReg callback may be called with vitual registers
that RAGreedy doesn't even know about yet. In that case, there are no
data structures to update.
llvm-svn: 139702
When a back-copy is hoisted to the nearest common dominator, keep
looking up the dominator tree for a less loopy dominator, and place the
back-copy there instead.
Don't do this when a single existing back-copy dominates all the others.
Assume the client knows what he is doing, and keep the dominating
back-copy.
This prevents us from hoisting back-copies into loops in most cases. If
a value is defined in a loop with multiple exits, we may still hoist
back-copies into that loop. That is the speed/size tradeoff.
llvm-svn: 139698
When a ParentVNI maps to multiple defs in a new interval, its live range
may still be derived directly from RegAssign by transferValues().
On the other hand, when instructions have been rematerialized or
hoisted, it may be necessary to completely recompute live ranges using
LiveRangeCalc::extend() to all uses.
Use a bit in the value map to indicate that a live range must be
recomputed. Rename markComplexMapped() to forceRecompute().
This fixes some live range verification errors when
-split-spill-mode=size hoists back-copies by recomputing source ranges
when RegAssign kills can't be moved.
llvm-svn: 139660
Whenever the complement interval is defined by multiple copies of the
same value, hoist those back-copies to the nearest common dominator.
This ensures that at most one copy is inserted per value in the
complement inteval, and no phi-defs are needed.
llvm-svn: 139651
This function is used to flag values where the complement interval may
overlap other intervals. Call it from overlapIntv, and use the flag to
fully recompute those live ranges in transferValues().
llvm-svn: 139612
The complement interval may overlap the other intervals created, so use
a separate LiveRangeCalc instance to compute its live range.
A LiveRangeCalc instance can only be shared among non-overlapping
intervals.
llvm-svn: 139603
SplitKit will soon need two copies of these data structures, and the
algorithms will also be useful when LiveIntervalAnalysis becomes
independent of LiveVariables.
llvm-svn: 139572
Splitting a landing pad takes considerable care because of PHIs and other
nasties. The problem is that the jump table needs to jump to the landing pad
block. However, the landing pad block can be jumped to only by an invoke
instruction. So we clone the landingpad instruction into its own basic block,
have the invoke jump to there. The landingpad instruction's basic block's
successor is now the target for the jump table.
But because of PHI nodes, we need to create another basic block for the jump
table to jump to. This is definitely a hack, because the values for the PHI
nodes may not be defined on the edge from the jump table. But that's okay,
because the jump table is simply a construct to mimic what is happening in the
CFG. So the values are mysteriously there, even though there is no value for the
PHI from the jump table's edge (hence calling this a hack).
llvm-svn: 139545
SplitKit always computes a complement live range to cover the places
where the original live range was live, but no explicit region has been
allocated.
Currently, the complement live range is created to be as small as
possible - it never overlaps any of the regions. This minimizes
register pressure, but if the complement is going to be spilled anyway,
that is not very important. The spiller will eliminate redundant
spills, and hoist others by making the spill slot live range overlap
some of the regions created by splitting. Stack slots are cheap.
This patch adds the interface to enable spill modes in SplitKit. In
spill mode, SplitKit will assume that the complement is going to spill,
so it will allow it to overlap regions in order to avoid back-copies.
By doing some of the spiller's work early, the complement live range
becomes simpler. In some cases, it can become much simpler because no
extra PHI-defs are required. This will speed up both splitting and
spilling.
This is only the interface to enable spill modes, no implementation yet.
llvm-svn: 139500
In some cases such as interpreters using indirectbr, the CFG can be very
complicated, and live range splitting may be forced to insert a large
number of phi-defs. When that happens, traceSiblingValue can spend a
lot of time zipping around in the CFG looking for defs and reloads.
This patch causes more information to be cached in SibValues, and the
cached values are used to terminate searches early. This speeds up
spilling by 20x in one interpreter test case. For more typical code,
this is just a 10% speedup of spilling.
The previous version had bugs that caused miscompilations. They have
been fixed.
llvm-svn: 139378
In some cases such as interpreters using indirectbr, the CFG can be very
complicated, and live range splitting may be forced to insert a large
number of phi-defs. When that happens, traceSiblingValue can spend a
lot of time zipping around in the CFG looking for defs and reloads.
This patch causes more information to be cached in SibValues, and the
cached values are used to terminate searches early. This speeds up
spilling by 20x in one interpreter test case. For more typical code,
this is just a 10% speedup of spilling.
llvm-svn: 139247
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)
llvm-svn: 139221
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
If we have a chain of zext -> assert_zext -> zext -> use, the first zext would get simplified away because of the later zext, and then the later zext would get simplified away because of the assert. The solution is to teach SimplifyDemandedBits that assert_zext demands all of the high bits of its input, rather than only those demanded by its users. No testcase because the only example I have manifests as llvm-gcc miscompiling LLVM, and I haven't found a smaller case that reproduces this problem.
Fixes <rdar://problem/10063365>.
llvm-svn: 139059
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.
llvm-svn: 138977
- On COFF the .lcomm directive has an alignment argument.
- On ELF we fall back to .local + .comm
Based on a patch by NAKAMURA Takumi.
Fixes PR9337, PR9483 and PR10128.
llvm-svn: 138976
An instruction may define part of a register where the other bits are
undefined. In that case, it is safe to rematerialize the instruction.
For example:
%vreg2:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg, %vreg2<imp-def>
The extra <imp-def> operand indicates that the instruction does not read
the other parts of the virtual register, so a remat is safe.
This patch simply allows multiple def operands for the virtual register.
It is MI->readsVirtualRegister() that determines if we depend on a
previous value so remat is impossible.
llvm-svn: 138953
An instruction that redefines only part of a larger register can never
be rematerialized since the virtual register value depends on the old
value in other parts of the register.
This was fixed for the inline spiller in r138794. This patch fixes the
problem for all register allocators, and includes a small test case.
<rdar://problem/10032939>
llvm-svn: 138944
Added canClobberReachingPhysRegUse() to handle a particular pattern in
which a two-address instruction could be forced to interfere with
EFLAGS, causing a compare to be unnecessarilly cloned.
Fixes rdar://problem/5875261
llvm-svn: 138924
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
llvm-svn: 138812
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
llvm-svn: 138621
A value of -1 at a call site tells the personality function that this call isn't
handled by the current function. Since the ResumeInsts are converted to calls to
_Unwind_SjLj_Resume, add a (volatile) store of -1 to its 'call site'.
llvm-svn: 138416
This is not necessarily the first or dominating use of the EH values. The IR
breaks if it's not. So replace the specific value in the instruction with the
new value.
llvm-svn: 138406
The invoke could be at the end of the entry block. If it's the only one, then we
won't process all of the landingpad instructions correctly. This code is
currently ugly, but should be made much nicer once the new EH switch is thrown.
llvm-svn: 138397
value, we insert a load of the exception object and selector object from memory,
which is where it actually resides. If it's used by a PHI node, we follow that
to where it is being used. Eventually, all landingpad instructions should have
no uses. Any PHI nodes that were associated with those landingpads should be
removed.
llvm-svn: 138302
the intent seems to be to terminate even in Release builds, just use abort()
directly.
If program flow ever reaches a __builtin_unreachable (which llvm_unreachable is
#define'd to on newer GCCs) then the program is undefined.
llvm-svn: 138068
Normally, a partial register def is treated as reading the
super-register unless it also defines the full register like this:
%vreg110:sub_32bit<def> = COPY %vreg77:sub_32bit, %vreg110<imp-def>
This patch also uses the <undef> flag on partial defs to recognize
non-reading operands:
%vreg110:sub_32bit<def,undef> = COPY %vreg77:sub_32bit
This fixes a subtle bug in RegisterCoalescer where LIS->shrinkToUses
would treat a coalesced copy as still reading the register, extending
the live range artificially.
My test case only works when I disable DCE so a dead copy is left for
RegisterCoalescer, so I am not including it.
<rdar://problem/9967101>
llvm-svn: 138018
The landingpad instruction is lowered into the EXCEPTIONADDR and EHSELECTION
SDNodes. The information from the landingpad instruction is harvested by the
'AddLandingPadInfo' function. The new EH uses the current EH scheme in the
back-end. This will change once we switch over to the new scheme. (Reviewed by
Jakob!)
llvm-svn: 137880
This generates the SDNodes for the new exception handling scheme. It takes the
two values coming from the landingpad instruction and assigns them to the
EXCEPTIONADDR and EHSELECTION nodes.
llvm-svn: 137873
Things are much saner now. We no longer need to modify the laning pads, because
of the invariants we impose upon them. The only thing DwarfEHPrepare needs to do
is convert the 'resume' instruction into a call to '_Unwind_Resume'.
llvm-svn: 137855
MDNodes graph structure such that compiler unit keeps track of important MDNodes and update dwarf writer to process mdnodes top-down instead of bottom up.
llvm-svn: 137778
When a variable is inlined multiple places, abstract variable keeps name, location, type etc.. info and all other concreate instances of the variable directly refers to abstract variable.
llvm-svn: 137637
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
The Query class now holds two iterators instead of an InterferenceResult
instance. The iterators are used as bookmarks for repeated
collectInterferingVRegs calls.
llvm-svn: 137380
The InterferenceResult iterator turned out to be less important than we
thought it would be. LiveIntervalUnion clients want higher level
information, like the list of interfering virtual registers.
llvm-svn: 137346
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
llvm-svn: 137133
This function doesn't have anything to do with spill weights, and MRI
already has functions for manipulating the register class of a virtual
register.
llvm-svn: 137123
The local ranges created get to stay in the RS_New stage, just like for
local and region splitting.
This gives tryLocalSplit a bit more freedom the first time it sees one
of these new local ranges.
llvm-svn: 137001
Normally, we don't create a live range for a single instruction in a
basic block, the spiller does that anyway. However, when splitting a
live range that belongs to a proper register sub-class, inserting these
extra COPY instructions completely remove the constraints from the
remainder interval, and it may be allocated from the larger super-class.
The spiller will mop up these small live ranges if we end up spilling
anyway. It calls them snippets.
llvm-svn: 136989
Some instructions require restricted register classes, but most of the
time that doesn't affect register allocation. For example, some
instructions don't work with the stack pointer, but that is a reserved
register anyway.
Sometimes it matters, GR32_ABCD only has 4 allocatable registers. For
such a proper sub-class, the register allocator should try to enable
register class inflation since that makes more registers available for
allocation.
Make sure only legal super-classes are considered. For example, tGPR is
not a proper sub-class in Thumb mode, but in ARM mode it is.
llvm-svn: 136981
The old code would look at kills and defs in one pass over the
instruction operands, causing problems with this code:
%R0<def>, %CPSR<def,dead> = tLSLri %R5<kill>, 2, pred:14, pred:%noreg
%R0<def>, %CPSR<def,dead> = tADDrr %R4<kill>, %R0<kill>, pred:14, %pred:%noreg
The last instruction kills and redefines %R0, so it is still live after
the instruction.
This caused a register scavenger crash when compiling 483.xalancbmk for
armv6. I am not including a test case because it requires too much bad
luck to expose this old bug.
First you need to convince the register allocator to use %R0 twice on
the tADDrr instruction, then you have to convince BranchFolding to do
something that causes it to run the register scavenger on he bad block.
<rdar://problem/9898200>
llvm-svn: 136973
inlined variable, based on the discussion in PR10542.
This explodes the runtime of several passes down the pipeline due to
a large number of "copies" remaining live across a large function. This
only shows up with both debug and opt, but when it does it creates
a many-minute compile when self-hosting LLVM+Clang. There are several
other cases that show these types of regressions.
All of this is tracked in PR10542, and progress is being made on fixing
the issue. Once its addressed, the re-instated, but until then this
restores the performance for self-hosting and other opt+debug builds.
Devang, let me know if this causes any trouble, or impedes fixing it in
any way, and thanks for working on this!
llvm-svn: 136953
It is possible to have multiple DBG_VALUEs for the same variable:
32L TEST32rr %vreg0<kill>, %vreg0, %EFLAGS<imp-def>; GR32:%vreg0
DBG_VALUE 2, 0, !"i"
DBG_VALUE %noreg, %0, !"i"
When that happens, keep the last one instead of the first.
llvm-svn: 136842
This helps generate better code in functions with high register
pressure.
The previous version of compact region splitting caused regressions
because the regions were a bit too large. A stronger negative bias
applied in r136832 fixed this problem.
llvm-svn: 136836
Apply twice the negative bias on transparent blocks when computing the
compact regions. This excludes loop backedges from the region when only
one of the loop blocks uses the register.
Previously, we would include the backedge in the region if the loop
preheader and the loop latch both used the register, but the loop header
didn't.
When both the header and latch blocks use the register, we still keep it
live on the backedge.
llvm-svn: 136832
This is either an invalid SlotIndex, or valno->def for the first value
defined inside the block. PHI values are not counted as defined inside
the block.
The FirstDef field will be used when estimating the cost of spilling
around a block.
llvm-svn: 136736
The PrefBoth constraint is used for blocks that ideally want a live-in
value both on the stack and in a register. This would be used by a block
that has a use before interference forces a spill.
Secondly, add the ChangesValue flag to BlockConstraint. This tells
SpillPlacement if a live-in value on the stack can be reused as a
live-out stack value for free. If the block redefines the virtual
register, a spill would be required for that.
This extra information will be used by SpillPlacement to more accurately
calculate spill costs when a value can exist both on the stack and in a
register.
The simplest example is a basic block that reads the virtual register,
but doesn't change its value. Spilling around such a block requires a
reload, but no spill in the block.
The spiller already knows this, but the spill placer doesn't. That can
sometimes lead to suboptimal regions.
llvm-svn: 136731
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.
While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.
llvm-svn: 136541
This flag is true from isel to register allocation when the machine
function is required to be in SSA form. The TwoAddressInstructionPass
and PHIElimination passes clear the flag.
The SSA flag wil be used by the machine code verifier to check for SSA
form, and eventually an assertion can enforce it in +Asserts builds.
This will catch the common target error of creating machine code with
multiple defs of a virtual register.
llvm-svn: 136532
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
This generates the correct SDNodes for the landingpad instruction. It makes an
assumption that the result of the landingpad instruction has at least two
values. And that the first value is a pointer to the exception object and the
second value is the "selector."
llvm-svn: 136430
'atomicrmw' instructions, which allow representing all the current atomic
rmw intrinsics.
The allowed operands for these instructions are heavily restricted at the
moment; we can probably loosen it a bit, but supporting general
first-class types (where it makes sense) might get a bit complicated,
given how SelectionDAG works.
As an initial cut, these operations do not support specifying an alignment,
but it would be possible to add if we think it's useful. Specifying an
alignment lower than the natural alignment would be essentially
impossible to support on anything other than x86, but specifying a greater
alignment would be possible. I can't think of any useful optimizations which
would use that information, but maybe someone else has ideas.
Optimizer/codegen support coming soon.
llvm-svn: 136404
Code like that would only be produced by bugpoint, but we should still
handle it correctly.
When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.
Fixes part of PR10520.
llvm-svn: 136401
There are two conflicting strategies in play:
- Under high register pressure, we want to assign large live ranges
first. Smaller live ranges are easier to place afterwards.
- Live range splitting is guided by interference, so splitting should be
deferred until interference is as realistic as possible.
With the recent changes to the live range stages, and with compact
regions enabled, it is less traumatic to split a live range too early.
If some of the split products were too big, they can often be split
again.
By reversing the RS_Split order, we get this queue order:
1. Normal live ranges, large to small.
2. RS_Split live ranges, large to small.
The large-to-small order improves RAGreedy's puzzle solving skills under
high register pressure. It may cause a bit more iterated splitting, but
we handle that better now.
With this change, -compact-regions is mostly an improvement on SPEC.
llvm-svn: 136388
When splitting global live ranges, it is now possible to split for
multiple destination intervals at once. Previously, we only had the main
and stack intervals.
Each edge bundle is assigned to a split candidate, and splitAroundRegion
will insert copies between the candidate intervals and the stack
interval as needed.
The multi-way splitting is used to split around compact regions when
enabled with -compact-regions. The best candidate register still gets
all the bundles it wants, but everything outside the main interval is
first split around compact regions before we create single-block
intervals.
Compact region splitting still causes some regressions, so it is not
enabled by default.
llvm-svn: 136186
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.
This fixes PR10503.
llvm-svn: 136174
When dead code elimination deletes a PHI value, the virtual register may
split into multiple connected components. In that case, revert each
component to the RS_Assign stage.
The new components are guaranteed to be smaller (the original value
numbers are distributed among the components), so this will always be
making progress. The components are now allowed to evict other live
ranges or be split again.
llvm-svn: 136034
This mechanism already exists, but the RS_Split2 stage makes it clearer.
When live range splitting creates ranges that may not be making
progress, they are marked RS_Split2 instead of RS_New. These ranges may
be split again, but only in a way that can be proven to make progress.
For local ranges, that means they must be split into ranges used by
strictly fewer instructions.
For global ranges, region splitting is bypassed and the RS_Split2
ranges go straight to per-block splitting.
llvm-svn: 135912
The stage is used to control where a live range is going, not where it
is coming from. Live ranges created by splitting will usually be marked
RS_New, but some are marked RS_Spill to avoid wasting time trying to
split them again.
The old RS_Global and RS_Local stages are merged - they are really the
same thing for local and global live ranges.
llvm-svn: 135911
This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.
Fix this by always rewriting <undef> operands with the register a def
operand would use.
llvm-svn: 135885
This method computes the edge bundles that should be live when splitting
around a compact region. This is independent of interference.
The function returns false if the live range was already a compact
region, or the compact region doesn't have any live bundles - it would
be the same as splitting around basic blocks.
Compact regions are computed using the normal spill placement code. We
pretend there is interference in all live-through blocks that don't use
the live range. This removes all edges from the Hopfield network used
for spill placement, so it converges instantly.
llvm-svn: 135847
If there is no interference and no last split point, we cannot
enterIntvBefore(Stop) - that function needs a real instruction.
Use enterIntvAtEnd instead for that very easy case.
This code doesn't currently run, it is needed by multi-way splitting.
llvm-svn: 135846
A split candidate can have a null PhysReg which means that it doesn't
map to a real interference pattern. Instead, pretend that all through
blocks have interference.
This makes it possible to generate compact regions where the live range
doesn't go through blocks that don't use it. The live range will still
be live between directly connected blocks with uses.
Splitting around a compact region tends to produce a live range with a
high spill weight, so it may evict a less dense live range.
llvm-svn: 135845
This method matches addLinks - All the listed blocks are considered to
have interference, so they add a negative bias to their bundles.
This could also be done by addConstraints, but that requires building a
separate BlockConstraint array.
llvm-svn: 135844
- Introduce JITDefault code model. This tells targets to set different default
code model for JIT. This eliminates the ugly hack in TargetMachine where
code model is changed after construction.
llvm-svn: 135580
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.
llvm-svn: 135468
to MCRegisterInfo. Also initialize the mapping at construction time.
This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step
towards fixing the layering violation.
llvm-svn: 135424
When splitting a live range immediately before an LDR_POST instruction
that redefines the address register, make sure to use the correct value
number in leaveIntvBefore.
We need the value number entering the instruction.
<rdar://problem/9793765>
llvm-svn: 135413
When trying to rematerialize a value before an instruction that has an
early-clobber redefine of the virtual register, make sure to look up the
correct value number.
Early-clobber defs are moved one slot back, so getBaseIndex is needed to
find the used value number.
Bugpoint was unable to reduce the test case for this, see PR10388.
llvm-svn: 135378
This gets rid of some of the gory splitting details in RAGreedy and
makes them available to future SplitKit clients.
Slightly generalize the functionality to support multi-way splitting.
Specifically, SplitEditor::splitLiveThroughBlock() supports switching
between different register intervals in a block.
llvm-svn: 135307
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
llvm-svn: 135180
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.
llvm-svn: 135144
Original commit message:
Count references to interference cache entries.
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135130
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135121
The cache entry referenced by the best split candidate could become
clobbered by an unsuccessful candidate.
The correct fix here is to use reference counts on the cache entries.
Coming up.
llvm-svn: 135113
Some pysical registers create split solutions that would spill anywhere.
They should not even be considered in future multi-way global splits.
This does not affect code generation (yet).
llvm-svn: 135080
an assert on Darwin llvm-gcc builds.
Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/buildslave/zorg/buildbot/smooshlab/slave-0.8/build.llvm-gcc-i386-darwin9-RA/llvm.src/lib/VMCore/Instructions.cpp, li\
ne 2067.
etc.
http://smooshlab.apple.com:8013/builders/llvm-gcc-i386-darwin9-RA/builds/2354
--- Reverse-merging r134893 into '.':
U include/llvm/Target/TargetData.h
U include/llvm/DerivedTypes.h
U tools/bugpoint/ExtractFunction.cpp
U unittests/Support/TypeBuilderTest.cpp
U lib/Target/ARM/ARMGlobalMerge.cpp
U lib/Target/TargetData.cpp
U lib/VMCore/Constants.cpp
U lib/VMCore/Type.cpp
U lib/VMCore/Core.cpp
U lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Instrumentation/ProfilingUtils.cpp
U lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/CodeGen/SjLjEHPrepare.cpp
--- Reverse-merging r134888 into '.':
G include/llvm/DerivedTypes.h
U include/llvm/Support/TypeBuilder.h
U include/llvm/Intrinsics.h
U unittests/Analysis/ScalarEvolutionTest.cpp
U unittests/ExecutionEngine/JIT/JITTest.cpp
U unittests/ExecutionEngine/JIT/JITMemoryManagerTest.cpp
U unittests/VMCore/PassManagerTest.cpp
G unittests/Support/TypeBuilderTest.cpp
U lib/Target/MBlaze/MBlazeIntrinsicInfo.cpp
U lib/Target/Blackfin/BlackfinIntrinsicInfo.cpp
U lib/VMCore/IRBuilder.cpp
G lib/VMCore/Type.cpp
U lib/VMCore/Function.cpp
G lib/VMCore/Core.cpp
U lib/VMCore/Module.cpp
U lib/AsmParser/LLParser.cpp
U lib/Transforms/Utils/CloneFunction.cpp
G lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Utils/InlineFunction.cpp
U lib/Transforms/Instrumentation/GCOVProfiling.cpp
U lib/Transforms/Scalar/ObjCARC.cpp
U lib/Transforms/Scalar/SimplifyLibCalls.cpp
U lib/Transforms/Scalar/MemCpyOptimizer.cpp
G lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/Transforms/IPO/ArgumentPromotion.cpp
U lib/Transforms/InstCombine/InstCombineCompares.cpp
U lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
U lib/Transforms/InstCombine/InstCombineCalls.cpp
U lib/CodeGen/DwarfEHPrepare.cpp
U lib/CodeGen/IntrinsicLowering.cpp
U lib/Bitcode/Reader/BitcodeReader.cpp
llvm-svn: 134949
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
patch brings numerous advantages to LLVM. One way to look at it
is through diffstat:
109 files changed, 3005 insertions(+), 5906 deletions(-)
Removing almost 3K lines of code is a good thing. Other advantages
include:
1. Value::getType() is a simple load that can be CSE'd, not a mutating
union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
uniques them. This means that the compiler doesn't merge them structurally
which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead
"const Type *" everywhere.
Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.
"LLVM 3.0" is the right time to do this.
There are still some cleanups pending after this, this patch is large enough
as-is.
llvm-svn: 134829
CPU, and feature string. Parsing some asm directives can change
subtarget state (e.g. .code 16) and it must be reflected in other
modules (e.g. MCCodeEmitter). That is, the MCSubtargetInfo instance
must be shared.
llvm-svn: 134795
Spills should be hoisted out of loops, but we don't want to hoist them
to dominating blocks at the same loop depth. That could cause the spills
to be executed more often.
llvm-svn: 134782
Try to move spills as early as possible in their basic block. This can
help eliminate interferences by shortening the live range being
spilled.
This fixes PR10221.
llvm-svn: 134776
RAGreedy::tryAssign will now evict interference from the preferred
register even when another register is free.
To support this, add the EvictionCost struct that counts how many hints
are broken by an eviction. We don't want to break one hint just to
satisfy another.
Rename canEvict to shouldEvict, and add the first bit of eviction policy
that doesn't depend on spill weights: Always make room in the preferred
register as long as the evictees can be split and aren't already
assigned to their preferred register.
Also make the CSR avoidance more accurate. When looking for a cheaper
register it is OK to use a new volatile register. Only CSR aliases that
have never been used before should be avoided.
llvm-svn: 134735
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.
struct foo { char x[24]; };
long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl $3, %eax
imull $-1431655765, %eax, %eax
instead of
movl 4(%esp), %eax
subl 8(%esp), %eax
movl $715827883, %ecx
imull %ecx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
movl %edx, %eax
llvm-svn: 134695
- Each target asm parser now creates its own MCSubtatgetInfo (if needed).
- Changed AssemblerPredicate to take subtarget features which tablegen uses
to generate asm matcher subtarget feature queries. e.g.
"ModeThumb,FeatureThumb2" is translated to
"(Bits & ModeThumb) != 0 && (Bits & FeatureThumb2) != 0".
llvm-svn: 134678
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:32:10 ]
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:31:10 ]
These two MIs represent identical value, 3.31..., for one variable, ds, but they are not identical because the represent two separate instances of inlined variable "ds".
llvm-svn: 134620
hasPredecessorHelper function allows predecessors to be cached to speed up
repeated invocations. This fixes PR10186.
X.isPredecessorOf(Y) now just calls Y.hasPredecessor(X)
Y.hasPredecessor(X) calls Y.hasPredecessorHelper(X, Visited, Worklist) with
empty Visited and Worklist sets (i.e. no caching over invocations).
Y.hasPredecessorHelper(X, Visited, Worklist) caches search state in Visited
and Worklist to speed up repeated calls. The Visited set is searched for X
before going to the worklist to further search the DAG if necessary.
llvm-svn: 134592
Unfortunately, the testcase I have is large and confidential, so I don't have a test to commit at the moment; I'll see if I can come up with something smaller where this issue reproduces.
<rdar://problem/9716278>
llvm-svn: 134565
This is impossible in theory, I can prove it. In practice, our near-zero
threshold can cause the network to oscillate between equally good
solutions.
<rdar://problem/9720596>
llvm-svn: 134428
Remat during spilling triggers dead code elimination. If a phi-def
becomes unused, that may also cause live ranges to split into separate
connected components.
This type of splitting is different from normal live range splitting. In
particular, there may not be a common original interval.
When the split range is its own original, make sure that the new
siblings are also their own originals. The range being split cannot be
used as an original since it doesn't cover the new siblings.
llvm-svn: 134413
This fixes the issue noted in PR10251 where early tail dup of bbs with
indirectbr would cause a bb to be duplicated into a loop preheader
and then into its predecessors, creating phi nodes with identical
operands just before register allocation.
This helps with jsinterp.o size (__TEXT goes from 163568 to 126656)
and a bit with performance 1.005x faster on sunspider (jits still enabled).
The result on webkit with the jit disabled is more significant: 1.021x faster.
llvm-svn: 134372
A split point inserted in a block with a landing pad successor may be
hoisted above the call to ensure that it dominates all successors. The
code that handles the rest of the basic block must take this into
account.
I am not including a test case, it would be very fragile. PR10244 comes
from building clang with exceptions enabled.
llvm-svn: 134369
Add a MI->emitError() method that the backend can use to report errors
related to inline assembly. Call it from X86FloatingPoint.cpp when the
constraints are wrong.
This enables proper clang diagnostics from the backend:
$ clang -c pr30848.c
pr30848.c:5:12: error: Inline asm output regs must be last on the x87 stack
__asm__ ("" : "=u" (d)); /* { dg-error "output regs" } */
^
1 error generated.
llvm-svn: 134307
Every live range is assigned a cascade number the first time it is
involved in an eviction. As the evictor, it gets a new cascade number.
Every evictee is assigned the same cascade number as the evictor.
Eviction is prohibited if the evictor has a lower assigned cascade
number than the evictee.
This means that assigned cascade numbers are monotonically increasing
with every eviction, yet they are bounded by NextCascade which can only
be incremented by new live ranges. Thus, infinite loops cannot happen,
but eviction cascades can still be triggered by new live ranges as we
want.
Thanks to Andy for explaining this to me.
llvm-svn: 134303
copy is a kill") to see if it fixes the i386 dragonegg buildbot, which is timing out
because gcc built with dragonegg is going into an infinite loop.
llvm-svn: 134237
The constraints are represented by the register class of the original
virtual register created for the inline asm. If the register class were
included in the operand descriptor, we might be able to do this.
For now, just give up on regclass inflation when inline asm is involved.
No test case, this bug hasn't happened yet.
llvm-svn: 134226
This patch will sometimes choose live range split points next to
interference instead of always splitting next to a register point. That
means spill code can now appear almost anywhere, and it was necessary
to fix code that didn't expect that.
The difficult places were:
- Between a CALL returning a value on the x87 stack and the
corresponding FpPOP_RETVAL (was FpGET_ST0). Probably also near x87
inline assembly, but that didn't actually show up in testing.
- Between a CALL popping arguments off the stack and the corresponding
ADJCALLSTACKUP.
Both are fixed now. The only place spill code can't appear is after
terminators, see SplitAnalysis::getLastSplitPoint.
Original commit message:
Rewrite RAGreedy::splitAroundRegion, now with cool ASCII art.
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134125
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134047
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
Removed the check that peeks past EXTRA_SUBREG, which I don't think
makes sense any more. Intead treat it as a normal register def. No
significant affect on x86 or ARM benchmarks.
llvm-svn: 133917
Both become <earlyclobber> defs on the INLINEASM MachineInstr, but we
now use two different asm operand kinds.
The new Kind_Clobber is treated identically to the old
Kind_RegDefEarlyClobber for now, but x87 floating point stack inline
assembly does care about the difference.
This will pop a register off the stack:
asm("fstp %st" : : "t"(x) : "st");
While this will pop the input and push an output:
asm("fst %st" : "=&t"(r) : "t"(x));
We need to know if ST0 was a clobber or an output operand, and we can't
depend on <dead> flags for that.
llvm-svn: 133902
The INLINEASM MachineInstrs have an immediate operand describing each
original inline asm operand. Decode the bits in MachineInstr::print() so
it is easier to read:
INLINEASM <es:rorq $1,$0>, $0:[regdef], %vreg0<def>, %vreg1<def>, $1:[imm], 1, $2:[reguse] [tiedto:$0], %vreg2, %vreg3, $3:[regdef-ec], %EFLAGS<earlyclobber,imp-def>
llvm-svn: 133901
target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.
First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.
llvm-svn: 133782
register allocation if it has a indirectbr or if we can duplicate it to
every predecessor.
This fixes the SingleSource/Benchmarks/Shootout-C++/matrix.cpp regression but
keeps the previous improvements to sunspider.
llvm-svn: 133682
If the linker supports it, this will hold the CIE and FDE information in a
compact format. The implementation of the compact unwinding emission is coming
soon.
llvm-svn: 133658
be one with only one unconditional branch and no phis. Duplicating the phis in this case
is possible, but requeres liveness analysis or breaking edges.
llvm-svn: 133607
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
llvm-svn: 133503
* Don't introduce a duplicated bb in the CFG
* When making a branch unconditional, clear the PredCond array so that it
is really unconditional.
llvm-svn: 133432
dragonegg buildbots back to life. Original commit message:
Teach early dup how to duplicate basic blocks with one successor and only phi instructions
into more complex blocks.
llvm-svn: 133430
all over the place in different styles and variants. Standardize on two
preferred entrypoints: one that takes a StructType and ArrayRef, and one that
takes StructType and varargs.
In cases where there isn't a struct type convenient, we now add a
ConstantStruct::getAnon method (whose name will make more sense after a few
more patches land).
It would be "really really nice" if the ConstantStruct::get and
ConstantVector::get methods didn't make temporary std::vectors.
llvm-svn: 133412
The LSDA is a bit difficult for the non-initiated to read. Even with comments,
it's not always clear what's going on. This wraps the ASM streamer in a class
that retains the LSDA and then emits a human-readable description of what's
going on in it.
So instead of having to make sense of:
Lexception1:
.byte 255
.byte 155
.byte 168
.space 1
.byte 3
.byte 26
Lset0 = Ltmp7-Leh_func_begin1
.long Lset0
Lset1 = Ltmp812-Ltmp7
.long Lset1
Lset2 = Ltmp913-Leh_func_begin1
.long Lset2
.byte 3
Lset3 = Ltmp812-Leh_func_begin1
.long Lset3
Lset4 = Leh_func_end1-Ltmp812
.long Lset4
.long 0
.byte 0
.byte 1
.byte 0
.byte 2
.byte 125
.long __ZTIi@GOTPCREL+4
.long __ZTIPKc@GOTPCREL+4
you can read this instead:
## Exception Handling Table: Lexception1
## @LPStart Encoding: omit
## @TType Encoding: indirect pcrel sdata4
## @TType Base: 40 bytes
## @CallSite Encoding: udata4
## @Action Table Size: 26 bytes
## Action 1:
## A throw between Ltmp7 and Ltmp812 jumps to Ltmp913 on an exception.
## For type(s): __ZTIi@GOTPCREL+4 __ZTIPKc@GOTPCREL+4
## Action 2:
## A throw between Ltmp812 and Leh_func_end1 does not have a landing pad.
llvm-svn: 133286
* We should change the generated code because of a debug use.
* Avoid creating debug uses of undef, as they become a kill.
Test to follow.
llvm-svn: 133255
Also switch the return type to ArrayRef<unsigned> which works out nicely
for ARM's implementation of this function because of the clever ArrayRef
constructors.
The name change indicates that the returned allocation order may contain
reserved registers as has been the case for a while.
llvm-svn: 133216
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).
<rdar://problem/9624323>
llvm-svn: 133210
I think PBQP could use RegisterClassInfo, but it didn't fit neatly with
the external interfaces that PBQP uses, so I'll leave that to Lang.
llvm-svn: 133186
BranchProbabilityInfo (expect setEdgeWeight which is not available here).
Branch Weights are kept in MachineBasicBlocks. To turn off this analysis
set -use-mbpi=false.
llvm-svn: 133184
This is intended to support using REG_SEQUENCE SDNode's with type MVT::untyped, and is part of the long road to eliminating some of the hacks we currently use to support register pairs and other strange constraints, particularly on ARM NEON.
llvm-svn: 133178
This virtual function will replace allocation_order_begin/end as the one
to override when implementing custom allocation orders. It is simpler to
have one function return an ArrayRef than having two virtual functions
computing different ends of the same array.
Use getRawAllocationOrder() in place of allocation_order_begin() where
it makes sense, but leave some clients that look like they really want
the filtered allocation orders from RegisterClassInfo.
llvm-svn: 133170
GetDemandBits (which must operate on the vector element type).
Fix the a usage of getZeroExtendInReg which must also be done on scalar types.
llvm-svn: 133052
converted to add x,x if x is a undef. add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.
llvm-svn: 133022
Dan noted that this would work on the case shown on the commit message. I think
the case that was failing was a bb ending with a redundant conditional jump:
...
jne foo
foo:
...
I was unable to find any such case in the tests or in a debug build of clang,
so I will revert this part of the patch and watch the bots.
llvm-svn: 133004
types (with power of two types such as 8,16,32 .. 512).
Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).
Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.
llvm-svn: 132985
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.
The attached test case provides an example where the fast allocator
would spill a register when other registers are available.
llvm-svn: 132900
Instead of scalarizing, and doing an element-by-element truncat, use vector
truncate.
Add support for scalarization of vectors: i8 -> <1 x i1> (from Duncan's
testcase).
llvm-svn: 132892
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
llvm-svn: 132882
and definitions when emitting global variables. This was causing global
declarations to be emitted as if they were definitions.
Fixes <rdar://problem/9429892>.
llvm-svn: 132825
With this I am able to bootstrap clang with early tail duplication enabled
for any small bb and setting tail-dup-size to a relatively large value(8) to
stress this code.
llvm-svn: 132816
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.
llvm-svn: 132809
I've been sitting on this long enough trying to find a test case. I
think the fix should go in now, but I'll keep working on the test case.
llvm-svn: 132701
When local live range splitting creates a live range with the same
number of instructions as the old range, mark it as RS_Local. When such
a range is seen again, require that it be split in a way that reduces
the number of instructions. That guarantees we are making progress while
still being able to perform 3 -> 2+3 splits as required by PR10070.
This also means that the PrevSlot map is no longer needed. This was also
used to estimate new spill weights, but that is no longer necessary
after slotIndexes::insertMachineInstrInMaps() got the extra Late
insertion argument.
llvm-svn: 132697
Only target-dependent hints require callbacks. The RCI allocation order
has CSR aliases last according to their order of appearance in the
getCalleeSavedRegs list. This can depend on the calling convention.
This way, AllocationOrder::next doesn't have to check for reserved
registers, and CSRs are always allocated last, even with weird calling
conventions.
llvm-svn: 132690
The order of registers returned by getCalleeSavedRegs is used to lay out
the fixed stack slots for CSRs. Some targets like their CSRs used from
one end, and some targets want them used from the other end.
When computing an allocation order, simply preserve the relative
ordering of CSRs that the target specifies in its allocation order.
Reordering CSRs would break some targets, ARM in particular.
We still place volatiles before the CSRs, providing slightly better
results with different calling conventions.
llvm-svn: 132680
(only happens when using the -promote-elements option).
The correct legalization order is to first try to promote element. Next, we try
to widen vectors.
llvm-svn: 132648
of reserved registers.
Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.
llvm-svn: 132581
When compiling a program with lots of small functions like
483.xalancbmk, this makes RAFast 11% faster.
Add some comments to clarify the difference between unallocatable and
reserved registers. It's quite subtle.
The fast register allocator depends on EFLAGS' not being allocatable on
x86. That way it can completely avoid tracking liveness, and it won't
mind when there are multiple uses of a single def.
llvm-svn: 132514
Some register classes are only used for instruction operand constraints.
They should never be used for virtual registers. Previously, those
register classes were given an empty allocation order, but now you can
say 'let isAllocatable=0' in the register class definition.
TableGen calculates if a register is part of any allocatable register
class, and makes that information available in TargetRegisterDesc::inAllocatableClass.
The goal here is to eliminate use cases for overriding allocation_order_*
methods.
llvm-svn: 132508
I was confused whether new uint8_t[] would zero-initialize the returned
array, and it seems that so is gcc-4.0.
This should fix the test failures on darwin 9.
llvm-svn: 132500
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!
llvm-svn: 132483
register classes.
It provides information for each register class that cannot be
determined statically, like:
- The number of allocatable registers in a class after filtering out the
reserved and invalid registers.
- The preferred allocation order with registers that overlap callee-saved
registers last.
- The last callee-saved register that overlaps a given physical register.
This information usually doesn't change between functions, so it is
reused for compiling multiple functions when possible. The many
possible combinations of reserved and callee saves registers makes it
unfeasible to compute this information statically in TableGen.
Use RegisterClassInfo to count available registers in various heuristics
in SimpleRegisterCoalescing, making the pass run 4% faster.
llvm-svn: 132450
patch we add a flag to enable a new type legalization decision - to promote
integer elements in vectors. Currently, the rest of the codegen does not support
this kind of legalization. This flag will be removed when the transition is
complete.
llvm-svn: 132394
For targets with no itinerary (x86) it is a nop by default. For
targets with issue width already expressed in the itinerary (ARM) it
bypasses a scoreboard check but otherwise does not affect the
schedule. It does make the code more consistent and complete and
allows new targets to specify their issue width in an arbitrary way.
llvm-svn: 132385
turns out that it could cause an infinite loop in some situations. If this code
is triggered and it converts a cleanup into a catchall, but that cleanup was in
already in a cleanup, then the _Unwind_SjLj_Resume could infinite loop. I.e.,
the code doesn't consume the exception object and passes it on to
_Unwind_SjLj_Resume. But _USjLjR expects it to be consumed (since it's landing
at a catchall instead of a cleanup). So it uses the values that are presently
there, which are the values that tell it to jump to the fake landing pad.
<rdar://problem/9508402>
llvm-svn: 132381
When assigned ranges are evicted, they are put in the RS_Evicted stage and are
not allowed to evict anything else. That prevents looping automatically.
When evicting ranges just to get a cheaper register, use only spill weights to
find the possible candidates. Avoid breaking hints for this purpose, it is not
worth it.
Start implementing more complex eviction heuristics, guarded by the temporary
-complex-eviction flag. The initial version permits a heavier range to be
evicted if it doesn't have any uses where the evicting range is live. This makes
it a good candidate for live ranfge splitting.
llvm-svn: 132358
This only affects targets like Mips where branch instructions may kill virtual
registers. Most other targets branch on flag values, so virtual registers are
not involved.
The problem is that MachineBasicBlock::updateTerminator deletes branches and
inserts new ones while LiveVariables keeps a list of pointers to instructions
that kill virtual registers. That list wasn't properly updated in
MBB::SplitCriticalEdge.
llvm-svn: 132298
handler.
At this moment, only GCC-style exceptions are supported. Other kinds
of exceptions, including "traditional" SEH and Microsoft Visual C++ exceptions,
need more work--and an compiler exception model that isn't specific to
GCC-style exceptions!
In particular, I imagine that it would be possible to mix "traditional" SEH
with GCC-style EH or Microsoft C++ EH. Currently LLVM has no way (beyond some
target-specific defaults and whole-module compiler switches) of knowing which
scheme to use when.
llvm-svn: 132283
This patch does not change the behavior of the type legalizer. The codegen
produces the same code.
This infrastructural change is needed in order to enable complex decisions
for vector types (needed by the vector-select patch).
llvm-svn: 132263
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
Delete the Kill and Def markers in BlockInfo. They are no longer
necessary when BlockInfo describes a continuous live range.
This only affects the relatively rare kind of basic block where a live
range looks like this:
|---x o---|
Now live range splitting can pretend that it is looking at two blocks:
|---x
o---|
This allows the code to be simplified a bit.
llvm-svn: 132245
It is important that this function returns the same number of live blocks as
countLiveBlocks(CurLI) because live range splitting uses the number of live
blocks to ensure it is making progress.
This is in preparation of supporting duplicate UseBlock entries for basic blocks
that have a virtual register live-in and live-out, but not live-though.
llvm-svn: 132244
There was no way to check if a given register/mode pair was valid. We now return
an error code (-2) instead of asserting. If anyone thinks that an assert
at this point is really needed, we can autogen a hasValidDwarfRegNum instead.
llvm-svn: 132236
subregisters:
When a value is in a subregister, at least report the location as being
the superregister. We should extend the .td files to encode the bit
range so that we can produce a DW_OP_bit_piece.
llvm-svn: 132224
This doesn't change functionality (much), but it allows for a more fine-grained
eviction policy. The current policy only compares spill weights, and that is not
always the best thing to do. Spill weights are designed to serve linear scan,
and they don't consider live range splitting.
Add a mechanism so canEvict() can request that a live range be evicted and
split/spilled. This is to avoid infinite eviction loops.
llvm-svn: 132101
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).
rdar://9437928 .
llvm-svn: 132099
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948
on CodeGen/X86/2007-05-07-InvokeSRet.ll. There is probably a bug here that was
fixed by r128961, but since there is no test or reference to a source file I have
to revert it.
llvm-svn: 131618
LiveInterval::shrinkToUses recomputes the live range from scratch instead of
removing snippets. This should avoid the problem with dangling live ranges.
Leave physreg identity copies alone. They can be created when joining a virtreg
with a physreg. They don't affect register allocation, and they will be removed
by the rewriter.
llvm-svn: 131521
The greedy register allocator has live range splitting and register class
inflation, so it can actually fully undo this join, including restoring the
original register classes.
We still don't want to do this for long live ranges, mostly because of the high
register pressure of there are many constrained live ranges overlapping.
llvm-svn: 131466
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.
This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic. Prioritizing
register allocation according to spill weight can cause more registers to be
used.
llvm-svn: 131436