Commit Graph

12962 Commits

Author SHA1 Message Date
Chandler Carruth 4a87aa0c31 Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094
2011-11-23 03:03:21 +00:00
Chandler Carruth ee54feb6f6 Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062
2011-11-22 13:13:16 +00:00
Chandler Carruth e2530dc889 Fix an obvious omission in the SelectionDAGBuilder where we were
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.

No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.

llvm-svn: 145060
2011-11-22 11:37:46 +00:00
Rafael Espindola 2021f38281 If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
2011-11-22 06:27:18 +00:00
Chandler Carruth 18dfac385b The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010
2011-11-20 11:22:06 +00:00
Chandler Carruth f3dc9eff16 Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994
2011-11-19 10:26:02 +00:00
Devang Patel 107e8ec30d DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
llvm-svn: 144937
2011-11-17 23:43:15 +00:00
Chad Rosier f83ab704e4 When fast iseling a GEP, accumulate the offset rather than emitting a series of
ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
coalesce ADDs.
rdar://10412592

llvm-svn: 144886
2011-11-17 07:15:58 +00:00
Eli Friedman ff1eaa7578 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
llvm-svn: 144863
2011-11-16 23:50:22 +00:00
Chad Rosier ff40b1e164 Add fast-isel stats to determine who's doing all the work, the
target-independent selector or the target-specific selector.

llvm-svn: 144833
2011-11-16 21:05:28 +00:00
Chad Rosier cfd0d10e72 Fix the stats collection for fast-isel. The failed count was only accounting
for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector.  This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.

llvm-svn: 144832
2011-11-16 21:02:08 +00:00
Evan Cheng 822ddde50d Disable expensive two-address optimizations at -O0. rdar://10453055
llvm-svn: 144806
2011-11-16 18:44:48 +00:00
Evan Cheng 624eb2af6f Disable the assertion again. Looks like fastisel is still generating bad kill markers.
llvm-svn: 144804
2011-11-16 18:32:14 +00:00
Evan Cheng ecb2908bf9 Sink codegen optimization level into MCCodeGenInfo along side relocation model
and code model. This eliminates the need to pass OptLevel flag all over the
place and makes it possible for any codegen pass to use this information.

llvm-svn: 144788
2011-11-16 08:38:26 +00:00
Bob Wilson cca9aa58ca Record landing pads with a SmallSetVector to avoid multiple entries.
There may be many invokes that share one landing pad, and the previous code
would record the landing pad once for each invoke.  Besides the wasted
effort, a pair of volatile loads gets inserted every time the landing pad is
processed.  The rest of the code can get optimized away when a landing pad
is processed repeatedly, but the volatile loads remain, resulting in code like:

LBB35_18:
Ltmp483:
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r2, [r7, #-72]
        ldr     r2, [r7, #-68]
        ldr     r4, [r7, #-72]
        ldr     r2, [r7, #-68]

llvm-svn: 144787
2011-11-16 07:57:21 +00:00
Bob Wilson 643e63c40c Update the SP in the SjLj jmpbuf whenever it changes. <rdar://problem/10444602>
This same basic code was in the older version of the SjLj exception handling,
but it was removed in the recent revisions to that code.  It needs to be there.

llvm-svn: 144782
2011-11-16 07:12:00 +00:00
Evan Cheng 4ac36c8e26 Revert r144568 now that r144730 has fixed the fast-isel kill marker bug.
llvm-svn: 144776
2011-11-16 04:55:01 +00:00
Evan Cheng b8c55a5339 If the 2addr instruction has other kills, don't move it below any other uses since we don't want to extend other live ranges.
llvm-svn: 144772
2011-11-16 03:47:42 +00:00
Evan Cheng 59f8156ea0 RescheduleKillAboveMI() must backtrack to before the rescheduled DBG_VALUE instructions. rdar://10451185
llvm-svn: 144771
2011-11-16 03:33:08 +00:00
Evan Cheng 9ddd69a8bc Process all uses first before defs to accurately capture register liveness. rdar://10449480
llvm-svn: 144770
2011-11-16 03:05:12 +00:00
Eli Friedman 87f92512c3 CONCAT_VECTORS can have more than two operands. PR11389.
llvm-svn: 144768
2011-11-16 02:52:39 +00:00
Eli Friedman d257a464d1 Add a couple asserts so it will be easier to debug if we accidentally pass indexed loads/stores to the legalizer.
llvm-svn: 144767
2011-11-16 02:43:15 +00:00
Owen Anderson ca2f78a95b Rename MVT::untyped to MVT::Untyped to match similar nomenclature.
llvm-svn: 144747
2011-11-16 01:02:57 +00:00
Eric Christopher 0abbd0ef5a Stabilize the output of the dwarf accelerator tables. Fixes a comparison
failure during bootstrap with it turned on.

llvm-svn: 144731
2011-11-15 23:37:17 +00:00
Chad Rosier 291ce47db7 GEPs with all zero indices are trivially coalesced by fast-isel. For example,
%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134

Prior to this commit, the GEP instruction that defines %arrayidx136 thought that 
%arrayidx135 was a trivial kill.  The GEP that defines %arrayidx135 doesn't 
generate any code and thus %M0 gets folded into the second GEP.  Thus, we need
to look through GEPs with all zero indices.
rdar://10443319

llvm-svn: 144730
2011-11-15 23:34:05 +00:00
Pete Cooper 7c7ba1baa1 Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
2011-11-15 21:57:53 +00:00
Devang Patel 43bde96a4c Insert modified DBG_VALUE into LiveDbgValueMap.
llvm-svn: 144696
2011-11-15 21:03:58 +00:00
Rafael Espindola f11e7f1305 We currently use a callback to handle an IL pass deleting a BB that still
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674
2011-11-15 19:08:46 +00:00
Benjamin Kramer 1f97a5a671 Remove all remaining uses of Value::getNameStr().
llvm-svn: 144648
2011-11-15 16:27:03 +00:00
Benjamin Kramer 4c93d15f09 Twinify GraphWriter a little bit.
llvm-svn: 144647
2011-11-15 16:26:38 +00:00
Jakob Stoklund Olesen e14ef7e6f8 Check all overlaps when looking for used registers.
A function using any RC alias is enough to enable the ExeDepsFix pass.

llvm-svn: 144636
2011-11-15 08:20:43 +00:00
Jay Foad ab9ebd3521 Make use of MachinePointerInfo::getFixedStack.
llvm-svn: 144635
2011-11-15 07:51:13 +00:00
Jay Foad 70679df664 Remove some unnecessary includes of PseudoSourceValue.h.
llvm-svn: 144634
2011-11-15 07:50:46 +00:00
Evan Cheng 7098c4e5f4 Set SeenStore to true to prevent loads from being moved; also eliminates a non-deterministic behavior.
llvm-svn: 144628
2011-11-15 06:26:51 +00:00
Chandler Carruth 9b548a7fcf Rather than trying to use the loop block sequence *or* the function
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627
2011-11-15 06:26:43 +00:00
Jakob Stoklund Olesen f8ad336bc4 Break false dependencies before partial register updates.
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602
2011-11-15 01:15:30 +00:00
Jakob Stoklund Olesen 543bef6ead Track register ages more accurately.
Keep track of the last instruction to define each register individually
instead of per DomainValue.  This lets us track more accurately when a
register was last written.

Also track register ages across basic blocks.  When entering a new
basic block, use the least stale predecessor def as a worst case
estimate for register age.

The register age is used to arbitrate between conflicting domains. The
most recently defined register wins.

llvm-svn: 144601
2011-11-15 01:15:25 +00:00
Evan Cheng f2fc508d4d Avoid dereferencing off the beginning of lists.
llvm-svn: 144569
2011-11-14 21:11:15 +00:00
Evan Cheng 28ffb7e444 At -O0, multiple uses of a virtual registers in the same BB are being marked
"kill". This looks like a bug upstream. Since that's going to take some time
to understand, loosen the assertion and disable the optimization when
multiple kills are seen.

llvm-svn: 144568
2011-11-14 21:02:09 +00:00
Evan Cheng 30f44ad785 Teach two-address pass to re-schedule two-address instructions (or the kill
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559
2011-11-14 19:48:55 +00:00
Jakob Stoklund Olesen 7e6004a3c1 Fix early-clobber handling in shrinkToUses.
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
2011-11-14 18:45:38 +00:00
Chandler Carruth fd9b4d9813 It helps to deallocate memory as well as allocate it. =] This actually
cleans up all the chains allocated during the processing of each
function so that for very large inputs we don't just grow memory usage
without bound.

llvm-svn: 144533
2011-11-14 10:57:23 +00:00
Chandler Carruth 0a31d149ea Remove an over-eager assert that was firing on one of the ARM regression
tests when I forcibly enabled block placement.

It is apparantly possible for an unanalyzable block to fallthrough to
a non-loop block. I don't actually beleive this is correct, I believe
that 'canFallThrough' is returning true needlessly for the code
construct, and I've left a bit of a FIXME on the verification code to
try to track down why this is coming up.

Anyways, removing the assert doesn't degrade the correctness of the algorithm.

llvm-svn: 144532
2011-11-14 10:55:53 +00:00
Chandler Carruth 0af6a0bb69 Begin chipping away at one of the biggest quadratic-ish behaviors in
this pass. We're leaving already merged blocks on the worklist, and
scanning them again and again only to determine each time through that
indeed they aren't viable. We can instead remove them once we're going
to have to scan the worklist. This is the easy way to implement removing
them. If this remains on the profile (as I somewhat suspect it will), we
can get a lot more clever here, as the worklist's order is essentially
irrelevant. We can use swapping and fold the two loops to reduce
overhead even when there are many blocks on the worklist but only a few
of them are removed.

llvm-svn: 144531
2011-11-14 09:46:33 +00:00
Chandler Carruth 84cd44c750 Under the hood, MBPI is doing a linear scan of every successor every
time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.

I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.

llvm-svn: 144530
2011-11-14 09:12:57 +00:00
Chandler Carruth a9e71faa0f Reuse the logic in getEdgeProbability within getHotSucc in order to
correctly handle blocks whose successor weights sum to more than
UINT32_MAX. This is slightly less efficient, but the entire thing is
already linear on the number of successors. Calling it within any hot
routine is a mistake, and indeed no one is calling it. It also
simplifies the code.

llvm-svn: 144527
2011-11-14 08:55:59 +00:00
Chandler Carruth ed5aa547bc Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526
2011-11-14 08:50:16 +00:00
Jakob Stoklund Olesen d7bcf43dc2 Use getVNInfoBefore() when it makes sense.
llvm-svn: 144517
2011-11-14 01:39:36 +00:00
Chandler Carruth 1071cfa4ae Teach machine block placement to cope with unnatural loops. These don't
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516
2011-11-14 00:00:35 +00:00
Jakob Stoklund Olesen 697979028f Use kill slots instead of the previous slot in shrinkToUses.
It's more natural to use the actual end points.

llvm-svn: 144515
2011-11-13 23:53:25 +00:00
Chandler Carruth c4a2cb34bb Cleanup some 80-columns violations and poor formatting. These snuck by
when I was reading through the code for style.

llvm-svn: 144513
2011-11-13 22:50:09 +00:00
Jakob Stoklund Olesen d8f2405e73 Terminate all dead defs at the dead slot instead of the 'next' slot.
This makes no difference for normal defs, but early clobber dead defs
now look like:

  [Slot_EarlyClobber; Slot_Dead)

instead of:

  [Slot_EarlyClobber; Slot_Register).

Live ranges for normal dead defs look like:

  [Slot_Register; Slot_Dead)

as before.

llvm-svn: 144512
2011-11-13 22:42:13 +00:00
Jakob Stoklund Olesen ce7cc08f3a Simplify early clobber slots a bit.
llvm-svn: 144507
2011-11-13 22:05:42 +00:00
Chandler Carruth 8e1d906734 Enhance the assertion mechanisms in place to make it easier to catch
when we fail to place all the blocks of a loop. Currently this is
happening for unnatural loops, and this logic helps more immediately
point to the problem.

llvm-svn: 144504
2011-11-13 21:39:51 +00:00
Jakob Stoklund Olesen 90b5e565b6 Rename SlotIndexes to match how they are used.
The old naming scheme (load/use/def/store) can be traced back to an old
linear scan article, but the names don't match how slots are actually
used.

The load and store slots are not needed after the deferred spill code
insertion framework was deleted.

The use and def slots don't make any sense because we are using
half-open intervals as is customary in C code, but the names suggest
closed intervals.  In reality, these slots were used to distinguish
early-clobber defs from normal defs.

The new naming scheme also has 4 slots, but the names match how the
slots are really used.  This is a purely mechanical renaming, but some
of the code makes a lot more sense now.

llvm-svn: 144503
2011-11-13 20:45:27 +00:00
Chandler Carruth 0bb42c0f86 Teach MBP to force-merge layout successors for blocks with unanalyzable
branches that also may involve fallthrough. In the case of blocks with
no fallthrough, we can still re-order the blocks profitably. For example
instruction decoding will in some cases continue past an indirect jump,
making laying out its most likely successor there profitable.

Note, no test case. I don't know how to write a test case that exercises
this logic, but it matches the described desired semantics in
discussions with Jakob and others. If anyone has a nice example of IR
that will trigger this, that would be lovely.

Also note, there are still assertion failures in real world code with
this. I'm digging into those next, now that I know this isn't the cause.

llvm-svn: 144499
2011-11-13 12:17:28 +00:00
Chandler Carruth f9213fe721 Hoist another gross nested loop into a helper method.
llvm-svn: 144498
2011-11-13 11:42:26 +00:00
Chandler Carruth eb4ec3aea5 Add a missing doxygen comment for a helper method.
llvm-svn: 144497
2011-11-13 11:34:55 +00:00
Chandler Carruth b336172f90 Hoist a nested loop into its own method.
llvm-svn: 144496
2011-11-13 11:34:53 +00:00
Chandler Carruth 8d15078927 Rewrite #3 of machine block placement. This is based somewhat on the
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495
2011-11-13 11:20:44 +00:00
NAKAMURA Takumi 4784df7161 Prune more RALinScan. RALinScan was also here!
llvm-svn: 144487
2011-11-13 01:33:10 +00:00
Jakob Stoklund Olesen c601d8c762 More dead code elimination in VirtRegMap.
This thing is looking a lot like a virtual register map now.

llvm-svn: 144486
2011-11-13 01:23:34 +00:00
Jakob Stoklund Olesen 28df7ef8c9 Stop tracking spill slot uses in VirtRegMap.
Nobody cared, StackSlotColoring scans the instructions to find used stack
slots.

llvm-svn: 144485
2011-11-13 01:23:30 +00:00
Jakob Stoklund Olesen 92255f27f1 Remove dead code and data from VirtRegMap.
Most of this stuff was supporting the old deferred spill code insertion
mechanism.  Modern spillers just edit machine code in place.

llvm-svn: 144484
2011-11-13 01:02:04 +00:00
Jakob Stoklund Olesen 38b3f312ca Stop tracking unused registers in VirtRegMap.
The information was only used by the register allocator in
StackSlotColoring.

llvm-svn: 144482
2011-11-13 00:39:45 +00:00
Jakob Stoklund Olesen 6ddb767fb5 Remove the -color-ss-with-regs option.
It was off by default.

The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.

llvm-svn: 144481
2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen 5343da6497 Delete VirtRegRewriter.
And there was much rejoicing.

llvm-svn: 144480
2011-11-13 00:16:01 +00:00
Jakob Stoklund Olesen 03f73ab76f Switch PBQP to VRM's trivial rewriter.
The very complicated VirtRegRewriter is going away.

llvm-svn: 144479
2011-11-13 00:02:24 +00:00
Jakob Stoklund Olesen f61a6fe221 Delete the old spilling framework from LiveIntervalAnalysis.
This is dead code, all register allocators use InlineSpiller.

llvm-svn: 144478
2011-11-12 23:57:05 +00:00
Jakob Stoklund Olesen 7ef502f6d1 Delete the 'standard' spiller with used the old spilling framework.
The current register allocators all use the inline spiller.

llvm-svn: 144477
2011-11-12 23:29:02 +00:00
Jakob Stoklund Olesen 11bb63a756 Switch PBQP to the modern InlineSpiller framework.
It is worth noting that the old spiller would split live ranges around
basic blocks. The new spiller doesn't do that.

PBQP should do its own live range splitting with
SplitEditor::splitSingleBlock() if desired.  See
RAGreedy::tryBlockSplit().

llvm-svn: 144476
2011-11-12 23:17:52 +00:00
Jakob Stoklund Olesen e7e50e6f45 Delete the linear scan register allocator.
RegAllocGreedy has been the default for six months now.

Deleting RegAllocLinearScan makes it possible to also delete
VirtRegRewriter and clean up the spiller code.

llvm-svn: 144475
2011-11-12 22:39:45 +00:00
Rafael Espindola e7cc8bff82 The dwarf standard says that the only differences between a out-of-line
instance and a concrete inlined instance are the use of DW_TAG_subprogram
instead of DW_TAG_inlined_subroutine and the who owns the tree.

We were also omitting DW_AT_inline from the abstract roots. To fix this,
make sure we mark abstract instance roots with DW_AT_inline even when
we have only out-of-line instances referring to them with DW_AT_abstract_origin.

FileCheck is not a very good tool for tests like this, maybe we should add
a -verify mode to llvm-dwarfdump.

llvm-svn: 144441
2011-11-12 01:57:54 +00:00
Eli Friedman 9d448e4a42 Don't try to form pre/post-indexed loads/stores until after LegalizeDAG runs. Fixes PR11029.
llvm-svn: 144438
2011-11-12 00:35:34 +00:00
Eli Friedman 1347715644 Some cleanup and bulletproofing for node replacement in LegalizeDAG. To maintain LegalizeDAG invariants, whenever we a node is replaced, we must attempt to delete it, and if it still
has uses after it is replaced (which can happen in rare cases due to CSE), we must revisit it.

llvm-svn: 144432
2011-11-11 23:58:27 +00:00
Nicolas Geoffray 26c328d734 Add a custom safepoint method, in order for language implementers to decide which machine instruction gets to be a safepoint.
llvm-svn: 144399
2011-11-11 18:32:52 +00:00
Eric Christopher 0a917b7ad4 Initialize variable.
llvm-svn: 144360
2011-11-11 03:16:32 +00:00
Eric Christopher c12c211c44 If we have a DIE with an AT_specification use that instead of the normal
addr DIE when adding to the dwarf accelerator tables.

llvm-svn: 144354
2011-11-11 01:55:22 +00:00
Rafael Espindola 79278365d3 Check in getOrCreateSubprogramDIE if a declaration exists and if so output
it first.

This is a more general fix to pr11300.

llvm-svn: 144324
2011-11-10 22:34:29 +00:00
Eric Christopher 66b37db641 Make types and namespaces take multiple DIEs for the accelerator tables
as well.

llvm-svn: 144319
2011-11-10 21:47:55 +00:00
Eric Christopher e288793e44 Move type handling to make sure we get all created types that aren't
forward decls and have names into the dwarf accelerator types table.

llvm-svn: 144306
2011-11-10 19:52:58 +00:00
Eric Christopher d9843b34e6 Rework adding function names to the dwarf accelerator tables, allow
multiple dies per function and support C++ basenames.

llvm-svn: 144304
2011-11-10 19:25:34 +00:00
Evan Cheng d33b2d6b7a Use a bigger hammer to fix PR11314 by disabling the "forcing two-address
instruction lower optimization" in the pre-RA scheduler.

The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.

Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.

llvm-svn: 144267
2011-11-10 07:43:16 +00:00
Jakob Stoklund Olesen eef48b6938 Strip old implicit operands after foldMemoryOperand.
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction.  This is not what we want when those implicit
operands refer to the register being spilled.

Implicit operands referring to other registers are preserved.

This fixes PR11347.

llvm-svn: 144247
2011-11-10 00:17:03 +00:00
Eli Friedman 53218b6fcc Add check so we don't try to perform an impossible transformation. Fixes issue from PR11319.
llvm-svn: 144216
2011-11-09 22:25:12 +00:00
Benjamin Kramer 966ed1b698 Add comments.
llvm-svn: 144194
2011-11-09 18:16:11 +00:00
Duncan Sands 635e4efca0 Speculatively revert commit 144124 (djg) in the hope that the 32 bit
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages).  Original commit message:

Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144188
2011-11-09 14:20:48 +00:00
Benjamin Kramer 148db36263 Take advantage of the zero byte in StringMap when emitting dwarf stringpool entries.
llvm-svn: 144184
2011-11-09 12:12:04 +00:00
Devang Patel fa4520968b Remove extra ';'
llvm-svn: 144172
2011-11-09 06:20:49 +00:00
Eric Christopher 5223a57533 Remove the pubnames section, no one consumes it.
llvm-svn: 144169
2011-11-09 05:24:07 +00:00
Jakob Stoklund Olesen 3dc89c9768 Collapse DomainValues across loop back-edges.
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.

After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.

This will properly fix execution domains on software pipelined code,
like the included test case.

llvm-svn: 144151
2011-11-09 01:06:56 +00:00
Jakob Stoklund Olesen 53ec977cd2 Link to the live DomainValue after merging.
When merging two uncollapsed DomainValues, place a link to the active
DomainValue from the passive DomainValue.  This allows old stale
references to the passive DomainValue to be updated to point to the
active DomainValue.

The new resolve() function finds the active DomainValue and updates the
pointer.

This change makes old live-out lists more useful since they may contain
uncollapsed DomainValues that have since been merged into other
DomainValues.

llvm-svn: 144149
2011-11-09 00:06:18 +00:00
Jakob Stoklund Olesen b7e44a3f5f Track reference count independently from clear().
This allows clear() to be called on a DomainValue with references.

llvm-svn: 144147
2011-11-08 23:26:00 +00:00
Jakob Stoklund Olesen 5d08293999 Call release() directly when cleaning up the remaining DomainValues.
There is no need to involve the LiveRegs array and kill() any longer.

llvm-svn: 144133
2011-11-08 22:05:17 +00:00
Jakob Stoklund Olesen 9e338bb0f3 Rename all methods to follow style guide.
No functional change.

llvm-svn: 144132
2011-11-08 21:57:47 +00:00
Jakob Stoklund Olesen 1438e191bd Handle reference counts in one function: release().
This new function will decrement the reference count, and collapse a
domain value when the last reference is gone.

This simplifies DomainValue reference counting, and decouples it from
the LiveRegs array.

llvm-svn: 144131
2011-11-08 21:57:44 +00:00
Eric Christopher 08a558eeef Also add the linkage name to the name accelerator tables if it exists
and is different than the normal name.

llvm-svn: 144130
2011-11-08 21:56:23 +00:00
Dan Gohman a4bc6171a5 Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144124
2011-11-08 21:29:06 +00:00
Jakob Stoklund Olesen 1205881820 Clear old DomainValue after merging.
The old value may still be referenced by some live-out list, and we
don't wan't to collapse those instructions twice.

This fixes the "Can only swizzle VMOVD" assertion in some armv7 SPEC
builds.

<rdar://problem/10413292>

llvm-svn: 144117
2011-11-08 20:57:04 +00:00
Eric Christopher 970771c0e8 Add the base ObjC method name to the names lookup table as well.
llvm-svn: 144105
2011-11-08 19:16:01 +00:00
Lang Hames b85fcd07df Lower mem-ops to unaligned i32/i16 load/stores on ARM where supported.
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.

llvm-svn: 144102
2011-11-08 18:56:23 +00:00
Pete Cooper 82cd9e81fc Added invariant field to the DAG.getLoad method and changed all calls.
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses

llvm-svn: 144100
2011-11-08 18:42:53 +00:00
Eric Christopher 54ce295d37 A few more places where we can avoid multiple size queries.
llvm-svn: 144099
2011-11-08 18:38:40 +00:00
Eric Christopher f1932270c0 Don't evaluate Data.size() on every iteration.
llvm-svn: 144095
2011-11-08 18:22:25 +00:00
Eli Friedman f2a9bd4b1e Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318.
Re-commit of r144034, with an extra fix so that RemoveDeadNode doesn't blow up.

llvm-svn: 144055
2011-11-08 01:25:24 +00:00
Eli Friedman a35a5295e0 Revert r144034 while I try to track down a crash.
llvm-svn: 144044
2011-11-07 23:53:20 +00:00
Bill Wendling 478f58cad4 This code is dead, what with the new EH model and the auto-upgraders in place.
Delete!

llvm-svn: 144043
2011-11-07 23:36:48 +00:00
Jakob Stoklund Olesen a70e9417fb Kill and collapse outstanding DomainValues.
DomainValues that are only used by "don't care" instructions are now
collapsed to the first possible execution domain after all basic blocks
have been processed.  This typically means the PS domain on x86.

For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are
completely collapsed to the PS domain instead of containing a mix of
execution domains created by isel.

llvm-svn: 144037
2011-11-07 23:08:21 +00:00
Eli Friedman 55a86d32d3 Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318.
llvm-svn: 144034
2011-11-07 22:51:10 +00:00
Eric Christopher 5139dadd50 Add all completed and named types to the dwarf type accelerator tables.
llvm-svn: 144027
2011-11-07 22:11:16 +00:00
Jakob Stoklund Olesen 68e197e151 Use a reverse post order instead of a DFS order.
The enterBasicBlock() function is combining live-out values from
predecessor blocks.  The RPO traversal means that more predecessors
have been visited when that happens, only back-edges are missing.

llvm-svn: 144025
2011-11-07 21:59:29 +00:00
Eric Christopher ff2edf1499 Move the hash function to using and taking a StringRef.
llvm-svn: 144024
2011-11-07 21:49:35 +00:00
Eric Christopher b6205d8b49 Simple destructor to delete the hash data we created earlier.
llvm-svn: 144023
2011-11-07 21:49:28 +00:00
Jakob Stoklund Olesen 736cf46c3e Extract two methods. No functional change.
llvm-svn: 144020
2011-11-07 21:40:27 +00:00
Jakob Stoklund Olesen 44dcc589b3 MBB doesn't need to be a class member.
llvm-svn: 144015
2011-11-07 21:23:42 +00:00
Jakob Stoklund Olesen baffa7d35d Fix pass name after the source was moved.
llvm-svn: 144014
2011-11-07 21:23:39 +00:00
Eric Christopher 988ac00abd Use StringRef::startswith to do some string comparisons.
llvm-svn: 143982
2011-11-07 18:53:23 +00:00
Eric Christopher 2c5dab541d Avoid the use of a local temporary for comment twines.
llvm-svn: 143974
2011-11-07 18:34:47 +00:00
Eric Christopher cc979f9ae6 Allow for the case where the name of the subprogram is "".
Fixes a self-host error.

llvm-svn: 143970
2011-11-07 18:10:17 +00:00
Richard Osborne 561fac4d4e Don't introduce custom nodes after legalization in TargetLowering::BuildSDIV()
and TargetLowering::BuildUDIV(). Fixes PR11283

llvm-svn: 143964
2011-11-07 17:09:05 +00:00
Eric Christopher 8c7505f258 Remove unnecessary addition to API. Replace with something much simpler.
llvm-svn: 143925
2011-11-07 09:38:42 +00:00
Eric Christopher e1c874aa70 Add new files to cmake.
llvm-svn: 143924
2011-11-07 09:37:06 +00:00
Eric Christopher 4996c70034 Add the support code to enable the dwarf accelerator tables. Upcoming patches
to fix the types section (all types, not just global types), and testcases.

The code to do the final emission is disabled by default.

llvm-svn: 143923
2011-11-07 09:24:32 +00:00
Eric Christopher 6e47204b0c Add a new dwarf accelerator table prototype with the goal of replacing
the pubnames and pubtypes tables. LLDB can currently use this format
and a full spec is forthcoming and submission for standardization is planned.

A basic summary:

The dwarf accelerator tables are an indirect hash table optimized
for null lookup rather than access to known data. They are output into
an on-disk format that looks like this:

.-------------.
|  HEADER     |
|-------------|
|  BUCKETS    |
|-------------|
|  HASHES     |
|-------------|
|  OFFSETS    |
|-------------|
|  DATA       |
`-------------'

where the header contains a magic number, version, type of hash function,
the number of buckets, total number of hashes, and room for a special
struct of data and the length of that struct.

The buckets contain an index (e.g. 6) into the hashes array. The hashes
section contains all of the 32-bit hash values in contiguous memory, and
the offsets contain the offset into the data area for the particular
hash.

For a lookup example, we could hash a function name and take it modulo the
number of buckets giving us our bucket. From there we take the bucket value
as an index into the hashes table and look at each successive hash as long
as the hash value is still the same modulo result (bucket value) as earlier.
If we have a match we look at that same entry in the offsets table and
grab the offset in the data for our final match.

llvm-svn: 143921
2011-11-07 09:18:42 +00:00
Eric Christopher a7b6189071 Expose a way to get the beginning of the dwarf string section.
llvm-svn: 143920
2011-11-07 09:18:38 +00:00
Eric Christopher 6abc9c5aaa Fix up comment.
llvm-svn: 143919
2011-11-07 09:18:35 +00:00
Eric Christopher 2b4f77350d Typo.
llvm-svn: 143918
2011-11-07 09:18:32 +00:00
Benjamin Kramer c74798d5cf Add an option to pad an uleb128 to MCObjectWriter and remove the uleb128 encoding from the DWARF asm printer.
As a side effect we now print dwarf ulebs with .ascii directives.

llvm-svn: 143809
2011-11-05 11:52:44 +00:00
Benjamin Kramer f3da529028 Add more PRI.64 macros for MSVC and use them throughout the codebase.
llvm-svn: 143799
2011-11-05 08:57:40 +00:00
Pete Cooper 77c703f11c Added missing &. Fixes <rdar://problem/10393723>
llvm-svn: 143753
2011-11-04 23:49:14 +00:00
Rafael Espindola 6cf4e830ce Emit declarations before definitions if they are available. This causes DW_AT_specification to
point back in the file in the included testcase. Fixes PR11300.

llvm-svn: 143726
2011-11-04 19:00:29 +00:00
Dan Gohman 198b7ffc11 Reapply r143206, with fixes. Disallow physical register lifetimes
across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660
2011-11-03 21:49:52 +00:00
Pete Cooper 65ba66c660 Reverted r143600 - selector reference change
llvm-svn: 143646
2011-11-03 20:47:50 +00:00
Daniel Dunbar bf9bba47a1 build: Add initial cut at LLVMBuild.txt files.
llvm-svn: 143634
2011-11-03 18:53:17 +00:00
Pete Cooper e6173d81ae Treat objc selector reference globals as invariant so that MachineLICM can hoist them out of loops. Fixes <rdar://problem/6027699>
llvm-svn: 143600
2011-11-03 00:56:36 +00:00
Bill Wendling 645eadac67 An array of chars of length 8 will also cause the stack protector to be inserted
into the function. Reflect that here so that the array will be placed next to
the SP.
<rdar://problem/10128329>

llvm-svn: 143590
2011-11-02 23:20:58 +00:00
Nick Lewycky d1ee7f8cf1 Don't emit a directory entry for the value in DW_AT_comp_dir, that is always
implied by directory index zero.

llvm-svn: 143570
2011-11-02 20:55:33 +00:00
Chandler Carruth ae4e800c5b Begin collecting some of the statistics for block placement discussed on
the mailing list. Suggestions for other statistics to collect would be
awesome. =]

Currently these are implemented as a separate pass guarded by a separate
flag. I'm not thrilled by that, but I wanted to be able to collect the
statistics for the old code placement as well as the new in order to
have a point of comparison. I'm planning on folding them into the single
pass if / when there is only one pass of interest.

llvm-svn: 143537
2011-11-02 07:17:12 +00:00
Jakob Stoklund Olesen 559d4dcc16 Update split candidate correctly when interference cache is full.
No test case, spotted by inspection.

llvm-svn: 143407
2011-11-01 00:02:31 +00:00
Nadav Rotem f310361a7d Cleanup. Document. Make sure that this build_vector optimization only runs before the op legalizer and that the used type is legal.
llvm-svn: 143358
2011-10-31 20:08:25 +00:00
Benjamin Kramer a4eba41b7a Silence compiler warning.
llvm-svn: 143308
2011-10-30 08:39:55 +00:00
Nadav Rotem bf6568b5d6 Add a new DAGCombine optimization for BUILD_VECTOR.
If all of the inputs are zero/any_extended, create a new simple BV
which can be further optimized by other BV optimizations.

llvm-svn: 143297
2011-10-29 21:23:04 +00:00
Dan Gohman 9b9c970148 Revert r143206, as there are still some failing tests.
llvm-svn: 143262
2011-10-29 00:41:52 +00:00
Dan Gohman 73057ad24f Reapply r143177 and r143179 (reverting r143188), with scheduler
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.

llvm-svn: 143206
2011-10-28 17:55:38 +00:00
NAKAMURA Takumi 29ccdd8207 Dwarf: [PR11022] Fix emitting DW_AT_const_value(>i64), to be host-endian-neutral.
Don't assume APInt::getRawData() would hold target-aware endianness nor host-compliant endianness. rawdata[0] holds most lower i64, even on big endian host.

FIXME: Add a testcase for big endian target.

FIXME: Ditto on CompileUnit::addConstantFPValue() ?
llvm-svn: 143194
2011-10-28 14:12:22 +00:00
Benjamin Kramer 47c3f2d625 Use BranchProbability compare operators.
llvm-svn: 143190
2011-10-28 11:14:31 +00:00
Duncan Sands 225a7037d6 Speculatively disable Dan's commits 143177 and 143179 to see if
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

Delete #if 0 code accidentally left in.

llvm-svn: 143188
2011-10-28 09:55:57 +00:00
Nick Lewycky cc64ae140d Always use the string pool, even when it makes the .o larger. This may help
tools that read the debug info in the .o files by making the DIE sizes more
consistent.

llvm-svn: 143186
2011-10-28 05:29:47 +00:00
Dan Gohman 0e8d1454b1 Delete #if 0 code accidentally left in.
llvm-svn: 143179
2011-10-28 01:41:21 +00:00
Dan Gohman 4db3f7dd83 Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

llvm-svn: 143177
2011-10-28 01:29:32 +00:00
Nick Lewycky d59c0cac6c Teach our Dwarf emission to use the string pool.
llvm-svn: 143097
2011-10-27 06:44:11 +00:00
Eli Friedman e9e356ad6b Don't crash on 128-bit sdiv by constant. Found by inspection.
llvm-svn: 143095
2011-10-27 02:06:39 +00:00
Lang Hames 58dba012b6 Rename NonScalarIntSafe to something more appropriate.
llvm-svn: 143080
2011-10-26 23:50:43 +00:00
Nick Lewycky 654f5ce812 Reflow lines, fix comments for doxygen style, fix whitespace. No functionality
change.

llvm-svn: 143074
2011-10-26 22:55:33 +00:00
Duncan Sands dce448c642 Simplify SplitVecRes_UnaryOp by removing all the code that is
trying to legalize the operand types when only the result type
is required to be legalized - the type legalization machinery
will get round to the operands later if they need legalizing.
There can be a point to legalizing operands in parallel with
the result: when this saves compile time or results in better
code.  There was only one case in which this was true: when
the operand is also split, so keep the logic for that bit.
As a result of this change, additional operand legalization
methods may need to be introduced to handle nodes where the
result and operand types can differ, like SIGN_EXTEND, but
the testsuite doesn't contain any tests where this is the case.
In any case, it seems better to require such methods (and die
with an assert if they doesn't exist) than to quietly produce
wrong code if we forgot to special case the node in
SplitVecRes_UnaryOp.

llvm-svn: 143026
2011-10-26 14:11:18 +00:00
Jakob Stoklund Olesen e8261a22f1 Don't use floating point to do an integer's job.
This code makes different decisions when compiled into x87 instructions
because of different rounding behavior.  That caused phase 2/3
miscompares on 32-bit Linux when the phase 1 compiler was built with gcc
(using x87), and the phase 2 compiler was built with clang (using SSE).

This fixes PR11200.

llvm-svn: 143006
2011-10-26 01:47:48 +00:00
Evan Cheng 7313337c85 Disable LICM speculation in high register pressure situation again now that Devang has fixed other issues.
llvm-svn: 143003
2011-10-26 01:26:57 +00:00
Bill Wendling 9b9932229d Reapply r142920 with fix:
An MBB which branches to an EH landing pad shouldn't be considered for tail merging.

In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>

llvm-svn: 143001
2011-10-26 01:10:25 +00:00
Eli Friedman 3e9ef907e0 Remove a couple redundant checks.
llvm-svn: 142959
2011-10-25 20:34:22 +00:00
Jim Grosbach a40f8c432a Make assert() message more informative.
PR11217.

llvm-svn: 142956
2011-10-25 20:30:48 +00:00
Duncan Sands 6ca458e49a Revert commit 142891. Takumi bisected the tablegen miscompiles
down to this commit.  Original commit message:

An MBB which branches to an EH landing pad shouldn't be considered for tail merging.

In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>

llvm-svn: 142920
2011-10-25 12:30:22 +00:00
Nick Lewycky d75675ab3b Remove dead enum value. There is no DIESectionOffset.
llvm-svn: 142912
2011-10-25 07:05:26 +00:00
Eric Christopher fe15841044 Remove unused forward decl.
llvm-svn: 142892
2011-10-25 00:55:35 +00:00
Bill Wendling 38ced995f6 An MBB which branches to an EH landing pad shouldn't be considered for tail merging.
In SjLj EH, the jump to the landing pad is not done explicitly through a branch
statement. The EH landing pad is added as a successor to the throwing
BB. Because of that however, the branch folding pass could mistakenly think that
it could merge the throwing BB with another BB. This isn't safe to do.
<rdar://problem/10334833>

llvm-svn: 142891
2011-10-25 00:54:05 +00:00
Bill Wendling 57e3aaad89 Check the visibility of the global variable before placing it into the stubs
table. A hidden variable could potentially end up in both lists.
<rdar://problem/10336715>

llvm-svn: 142869
2011-10-24 23:05:43 +00:00
Douglas Gregor 0cc574eee7 Really unbreak CMake build
llvm-svn: 142822
2011-10-24 18:10:52 +00:00
Douglas Gregor d0800fda95 Unbreak CMake build
llvm-svn: 142821
2011-10-24 18:09:23 +00:00
Dan Gohman e2ff95e327 Delete the top-down "Latency" scheduler. Top-down scheduling doesn't handle
physreg dependencies, and upcoming codegen changes will require proper
physreg dependence handling.

llvm-svn: 142816
2011-10-24 18:01:06 +00:00
Dan Gohman d78fc160cc Delete the Latency scheduling preference.
llvm-svn: 142815
2011-10-24 17:56:48 +00:00
Dan Gohman 4ed1afa51d Change this overloaded use of Sched::Latency to be an overloaded
use of Sched::ILP instead, as Sched::Latency is going away.

llvm-svn: 142813
2011-10-24 17:55:11 +00:00
Dan Gohman c32af340fc Change the default scheduler from Latency to ILP, since Latency
is going away.

llvm-svn: 142810
2011-10-24 17:45:02 +00:00
Bill Wendling 38f86c505c Cleanup. Get rid of the old SjLj EH lowering code. No functionality change.
llvm-svn: 142800
2011-10-24 17:12:36 +00:00
Chandler Carruth 30b63c6430 Sink an otherwise unused variable's initializer into the asserts that
used it. Fixes an unused variable warning from GCC on release builds.

llvm-svn: 142799
2011-10-24 16:51:55 +00:00
Chandler Carruth fd7475e906 Now that we have comparison on probabilities, add some static functions
to get important constant branch probabilities and use them for finding
the best branch out of a set of possibilities.

llvm-svn: 142762
2011-10-23 20:10:34 +00:00
Chandler Carruth 446210b616 Remove a commented out line of code that snuck by my auditing.
llvm-svn: 142761
2011-10-23 20:10:30 +00:00
Chandler Carruth bd1be4d01c Completely re-write the algorithm behind MachineBlockPlacement based on
discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.

The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.

As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.

Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.

The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.

llvm-svn: 142743
2011-10-23 09:18:45 +00:00
Bill Wendling b1c430886b Make sure that the landing pads themselves have no PHI instructions in them.
The assumption in the back-end is that PHIs are not allowed at the start of the
landing pad block for SjLj exceptions.
<rdar://problem/10313708>

llvm-svn: 142689
2011-10-21 22:08:56 +00:00
Nadav Rotem 5e00bb5feb Fix pr11194. When promoting and splitting integers we need to use
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.

SetCC return type needs to be legalized via PromoteTargetBoolean.

llvm-svn: 142660
2011-10-21 17:35:19 +00:00
Nadav Rotem d315157f12 1. Fix the widening of SETCC in WidenVecOp_SETCC. Use the correct return CC type.
2. Fix a typo in CONCAT_VECTORS which exposed the bug in #1.

llvm-svn: 142648
2011-10-21 11:42:07 +00:00
Chandler Carruth 8b9737cb54 Add loop aligning to MachineBlockPlacement based on review discussion so
it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.

This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.

Test is updated to include both basic loop alignment and nested loop
alignment.

llvm-svn: 142645
2011-10-21 08:57:37 +00:00
Chandler Carruth 1028142564 Implement a block placement pass based on the branch probability and
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:

1) It operates on the Machine-IR in the CodeGen layer. This exposes much
   more (and more precise) information and opportunities. Also, the
   results are more stable due to fewer transforms ocurring after the
   pass runs.
2) It uses the generalized probability and frequency analyses. These can
   model static heuristics, code annotation derived heuristics as well
   as eventual profile loading. By basing the optimization on the
   analysis interface it can work from any (or a combination) of these
   inputs.
3) It uses a more aggressive algorithm, both building chains from tho
   bottom up to maximize benefit, and using an SCC-based walk to layout
   chains of blocks in a profitable ordering without O(N^2) iterations
   which the old pass involves.

The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.

Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D

llvm-svn: 142641
2011-10-21 06:46:38 +00:00
Chandler Carruth 001153784a Remove a now dead function, fixing -Wunused-function warnings from
Clang.

llvm-svn: 142631
2011-10-21 01:23:41 +00:00
Dan Gohman 90fb55237b Delete the list-tdrr scheduler. Top-down schedulers are going away
because they don't support physical register dependencies.

llvm-svn: 142620
2011-10-20 21:44:34 +00:00
Chad Rosier 4236a63c3c Revert r142579, "Fix a type in the legalization of CONCAT_VECTORS". This is
causing one of the unit tests to infinitely loop, which resulted in the 
buildbots stalling.

llvm-svn: 142604
2011-10-20 19:19:10 +00:00
Devang Patel 1d8ab465bf As Evan suggested, loads from constant pool are safe to speculate.
llvm-svn: 142593
2011-10-20 17:42:23 +00:00
Devang Patel 830c776a94 Add a comment.
llvm-svn: 142592
2011-10-20 17:31:18 +00:00
Nadav Rotem fe3969293d Fix a type in the legalization of CONCAT_VECTORS.
llvm-svn: 142579
2011-10-20 13:38:16 +00:00
Nadav Rotem 8824472a25 Improve code generation for vselect on SSE2:
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.

llvm-svn: 142542
2011-10-19 20:43:16 +00:00
Nadav Rotem 6652e22bad Add support for the vector-widening of vselect and vector-setcc
llvm-svn: 142488
2011-10-19 09:45:11 +00:00
Nick Lewycky ac4c1860a3 Missed a spot!
llvm-svn: 142436
2011-10-18 22:40:18 +00:00
Nick Lewycky 5ca33ac926 Fix some typo/formatting issues. No functionality change.
llvm-svn: 142435
2011-10-18 22:39:43 +00:00
Nadav Rotem 75c2229f41 Fix a bug in the legalization of vector anyext-load and trunc-store. Mem Index starts with zero.
llvm-svn: 142434
2011-10-18 22:32:43 +00:00
Bob Wilson 681561901d Fix a DAG combiner assertion failure when constant folding BUILD_VECTORS.
svn r139159 caused SelectionDAG::getConstant() to promote BUILD_VECTOR operands
with illegal types, even before type legalization.  For this testcase, that led
to one BUILD_VECTOR with i16 operands and another with promoted i32 operands,
which triggered the assertion.

llvm-svn: 142370
2011-10-18 17:34:47 +00:00
Duncan Sands d278d35b13 Fix a bunch of unused variable warnings when doing a release
build with gcc-4.6.

llvm-svn: 142350
2011-10-18 12:44:00 +00:00
Hal Finkel bab66789d5 Fix comment to refer to correct instruction
llvm-svn: 142334
2011-10-18 03:51:57 +00:00
Nick Lewycky 479a8fe75e Minor style cleanup, no functionality change.
llvm-svn: 142307
2011-10-17 23:27:36 +00:00
Nick Lewycky 40f8f2ff24 Add support for a new extension to the .file directive:
.file filenumber "directory" "filename"

This removes one join+split of the directory+filename in MC internals. Because
bitcode files have independent fields for directory and filenames in debug info,
this patch may change the .o files written by existing .bc files.

llvm-svn: 142300
2011-10-17 23:05:28 +00:00
Bill Wendling aa9047d3f5 Now Igor, throw the switch...give my creation life!
Use the custom inserter for the ARM setjmp intrinsics. Instead of creating the
SjLj dispatch table in IR, where it frequently violates serveral assumptions --
in particular assumptions made by the landingpad instruction about what can
branch to a landing pad and what cannot. Performing this in the back-end allows
us to violate these assumptions without the IR getting angry at us.

It also allows us to perform a small optimization. We can shove the address of
the dispatch's basic block into the function context and not have to add code
around the setjmp to check for the return value and jump to the dispatch.

Neat, huh?
<rdar://problem/10116753>

llvm-svn: 142294
2011-10-17 22:26:23 +00:00
Cameron Zwarich d85bc104ef When deleting a phi cycle after looking through copies, constrain the register
to match its final use.

With this change, all of test-suite compiles for Thumb2 with -verify-coalescing
enabled.

llvm-svn: 142287
2011-10-17 21:54:46 +00:00
Evan Cheng aa563df759 Constraint register class with constrainRegClass() to CSE a virtual into another. rdar://10293289
llvm-svn: 142234
2011-10-17 19:50:12 +00:00
Bill Wendling 63a4ea1859 Correct over-zealous removal of hack.
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.

llvm-svn: 142221
2011-10-17 18:43:40 +00:00
Bill Wendling 2a83a71c2a Now that we have the ReturnsTwice function attribute, this method is
obsolete. Check the attribute instead.
<rdar://problem/8031714>

llvm-svn: 142212
2011-10-17 18:22:52 +00:00
Chad Rosier c17257c4cb Removed set, but unused variable.
Patch by Joe Abbey <jabbey@arxan.com>.

llvm-svn: 142206
2011-10-17 18:01:59 +00:00
Devang Patel 69a4565e65 It is safe to speculate load from GOT. This fixes performance regression caused by r141689.
Radar 10281206.

llvm-svn: 142202
2011-10-17 17:35:01 +00:00
Nadav Rotem 486ff59a9f Enable element promotion type legalization by deafault.
Changed tests which assumed that vectors are legalized by widening them.

llvm-svn: 142152
2011-10-16 20:31:33 +00:00
Benjamin Kramer cc863b2bb6 Let printf do the formatting instead aligning strings ourselves.
While at it, merge some format strings.

llvm-svn: 142140
2011-10-16 16:30:34 +00:00
Benjamin Kramer cb6b02a086 Twinify better.
llvm-svn: 142139
2011-10-16 15:46:29 +00:00
Nadav Rotem ebe13bc3f1 Move the legalization of vector loads and stores into LegalizeVectorOps. In some
cases we need the second type-legalization pass in order to support all cases.

llvm-svn: 142060
2011-10-15 07:41:10 +00:00
Bill Wendling 2730a0099a Clear out the landing pad to call site map for each function.
This isn't put into the 'clear()' method because the information needs to stick
around (at least for a little bit) after the selection DAG is built.

llvm-svn: 142032
2011-10-15 01:00:26 +00:00
Evan Cheng 06fdaeb5d9 A few 80-col violations.
llvm-svn: 141988
2011-10-14 20:36:23 +00:00
Jakob Stoklund Olesen 06b6ccfe90 Update live-in lists when splitting critical edges.
Fixes PR10814. Patch by Jan Sjödin!

llvm-svn: 141960
2011-10-14 17:25:46 +00:00
Jim Grosbach 400907cc41 Fix typo. "__sync_fetch_and-xor_4" should be "__sync_fetch_and_xor_4".
Pointed out by George Russell.

llvm-svn: 141956
2011-10-14 15:53:48 +00:00
Jakob Stoklund Olesen 7fb5632e73 Add value numbers when spilling dead defs.
When spilling around an instruction with a dead def, remember to add a
value number for the def.

The missing value number wouldn't normally create problems since there
would be an incoming live range as well.  However, due to another bug
we could spill a dead V_SET0 instruction which doesn't read any values.

The missing value number caused an empty live range to be created which
is dangerous since it doesn't interfere with anything.

This fixes part of PR11125.

llvm-svn: 141923
2011-10-14 00:34:31 +00:00
Eric Christopher 76933f4c0b Don't forget to reconstruct D after changing the scope that we're
looking at.

llvm-svn: 141892
2011-10-13 21:43:44 +00:00
Cameron Zwarich 86f7d3556c Use an existing method.
llvm-svn: 141855
2011-10-13 07:36:41 +00:00
Nick Lewycky 594a545821 If MI is deleted then remove it from the set. If a new MI is created, it could
have the same address as the one we deleted, and we don't want that in the set
yet. Noticed by inspection.

llvm-svn: 141849
2011-10-13 02:16:18 +00:00
Nick Lewycky 404feb9973 Tabs to spaces.
llvm-svn: 141844
2011-10-13 01:09:50 +00:00
Nick Lewycky 8488225984 Add missing braces to pacify GCC's -Wparentheses.
llvm-svn: 141842
2011-10-13 00:54:59 +00:00
Jakob Stoklund Olesen 068dc91de9 Also inflate register classes around inline asm.
Now that MI->getRegClassConstraint() can also handle inline assembly,
don't bail when recomputing the register class of a virtual register
used by inline asm.

This fixes PR11078.

llvm-svn: 141836
2011-10-12 23:37:40 +00:00
Jakob Stoklund Olesen 35b362fab2 Add MachineInstr::getRegClassConstraint().
Most instructions have some requirements for their register operands.
Usually, this is expressed as register class constraints in the
MCInstrDesc, but for inline assembly the constraints are encoded in the
flag words.

llvm-svn: 141835
2011-10-12 23:37:36 +00:00
Jakob Stoklund Olesen 1e73716eae Extract a method for finding the inline asm flag operand.
llvm-svn: 141834
2011-10-12 23:37:33 +00:00
Jakob Stoklund Olesen 24abd9d9b6 Encode register class constreaints in inline asm instructions.
The inline asm operand constraint is initially encoded in the virtual
register for the operand, but that register class may change during
coalescing, and the original constraint is lost.

Encode the original register class as part of the flag word for each
inline asm operand.  This makes it possible to recover the actual
constraint required by inline asm, just like we can for normal
instructions.

llvm-svn: 141833
2011-10-12 23:37:29 +00:00
Bill Wendling 3e5409df77 We need to verify that the machine instruction we're using as a replacement for
our current machine instruction defines a register with the same register class
as what's being replaced. This showed up in the SPEC 403.gcc benchmark, where it
would ICE because a tail call was expecting one register class but was given
another. (The machine instruction verifier catches this situation.)
<rdar://problem/10270968>

llvm-svn: 141830
2011-10-12 23:03:40 +00:00
Eli Friedman 979009ea61 Use a utility from MathExtras to clarify a check and avoid undefined behavior. Based on patch by Ahmed Charles.
llvm-svn: 141829
2011-10-12 22:46:45 +00:00
Evan Cheng b35afcaa56 Disable machine LICM speculation check (for profitability) until I have time to investigate the regressions.
llvm-svn: 141813
2011-10-12 21:33:49 +00:00
Cameron Zwarich 2dffcebf77 To find the exiting VN of a LiveInterval from a block, use the previous slot
rather than the previous index. If a block has a single instruction, the
previous index may be in a different basic block.

I have no clue how this used to work on all of test-suite, because now this
failure is seen quite often when trying to compile code with -strong-phi-elim.
This fixes PR10252.

llvm-svn: 141812
2011-10-12 21:24:54 +00:00
Dan Gohman de239d2647 Fix a thinko that Nick noticed. The previous code actually worked as
intended, but only by accident.

llvm-svn: 141779
2011-10-12 15:56:56 +00:00
Bill Wendling 918cea2c27 Expand the check for a landing pad so that it looks at the basic block's
containing loop's header to see if that's a landing pad. If it is, then we don't
want to hoist instructions out of the loop and above the header.

llvm-svn: 141767
2011-10-12 02:58:01 +00:00
Jakob Stoklund Olesen 35163e21dc Use an existing function.
llvm-svn: 141763
2011-10-12 01:24:51 +00:00
Evan Cheng af1389546e Fix r141744.
1. The speculation check may not have been performed if the BB hasn't had a load
   LICM candidate.
2. If the candidate would be CSE'ed, then go ahead and speculatively LICM the
   instruction even if it's in high register pressure situation.

llvm-svn: 141747
2011-10-12 00:09:14 +00:00
Evan Cheng f192ca0761 Refine r141689 with a tri-state variable.
Also teach MachineLICM to avoid "speculation" when register pressure is high.

llvm-svn: 141744
2011-10-11 23:48:44 +00:00
Eric Christopher 6647b83087 Add a new wrapper node for a DILexicalBlock that encapsulates it and a
file. Since it should only be used when necessary propagate it through
the backend code generation and tweak testcases accordingly.

This helps with code like in clang's test/CodeGen/debug-info-line.c where
we have multiple #line directives within a single lexical block and want
to generate only a single block that contains each file change.

Part of rdar://10246360

llvm-svn: 141729
2011-10-11 22:59:11 +00:00
Eric Christopher 57d1692750 Formatting.
llvm-svn: 141728
2011-10-11 22:59:04 +00:00
Bill Wendling 579ff6c39c N.B. This is with the new EH scheme:
The blocks with invokes have branches to the dispatch block, because that more
correctly models the behavior of the CFG. The dispatch of course has edges to
the landing pads. Those landing pads could contain invokes, which then have
branches back to the dispatch. This creates a loop. The machine LICM pass looks
at this loop and thinks it can hoist elements out of it. But because the
dispatch is an alternate entry point into the program, the hoisted instructions
won't be executed.

I wasn't able to get a testcase which was small and could reproduce all of the
time. The function_try_block.cpp in llvm-test was where this showed up.

llvm-svn: 141726
2011-10-11 22:42:31 +00:00
Devang Patel 453d401a51 Add dominance check for the instruction being hoisted.
For example, MachineLICM should not hoist a load that is not guaranteed to be executed.
Radar 10254254.

llvm-svn: 141689
2011-10-11 18:09:58 +00:00
Nadav Rotem 3283793c9a Add support for legalization of vector SHL/SRA/SRL instructions
llvm-svn: 141667
2011-10-11 14:36:35 +00:00
Nadav Rotem 198fe81571 Add support for legalization of vector trunc-store where the saved scalar type is illegal (for example, v2i16 on systems where the smallest store size is i32)
llvm-svn: 141661
2011-10-11 11:25:16 +00:00
Nadav Rotem b521b6037b Cleanup the trunc-store legalization code and add asserts.
llvm-svn: 141659
2011-10-11 10:04:25 +00:00
Devang Patel 478d5bc0d0 Revert r141569 and r141576.
llvm-svn: 141594
2011-10-10 23:18:02 +00:00
Jakob Stoklund Olesen add0c43ebb Give targets a chance to expand even standard pseudos.
Allow targets to expand COPY and other standard pseudo-instructions
before they are expanded with copyPhysReg().

This allows the target to examine the COPY instruction for extra
operands indicating it can be widened to a preferable super-register
copy.  See the ARM -widen-vmovs option.

llvm-svn: 141578
2011-10-10 20:34:28 +00:00
Devang Patel 2689f95875 If loop header is also loop exiting block then it may not be safe to hoist instructions.
llvm-svn: 141576
2011-10-10 20:32:03 +00:00
Devang Patel e554d5995b Add dominance check for the instruction being hoisted.
For example, MachineLICM should not hoist a load that is not guaranteed to be executed.
Radar 10254254.

llvm-svn: 141569
2011-10-10 19:09:20 +00:00
Bill Wendling e9574be6a3 Use the code that lowers the arguments and spills any values which are alive
across unwind edges. This is for the back-end which expects such things.

The code is from the original SjLj EH pass.

llvm-svn: 141463
2011-10-08 00:56:47 +00:00
Bill Wendling 7ecfbd90ef Thread the chain through the eh.sjlj.setjmp intrinsic, like it's documented to
do. This will be useful later on with the new SJLJ stuff.

llvm-svn: 141416
2011-10-07 21:25:38 +00:00
Andrew Trick 35c9e51219 PostRA scheduler fix. Clear stale loop dependencies.
Fixes <rdar://problem/10235725>

llvm-svn: 141357
2011-10-07 06:33:09 +00:00
Andrew Trick 4ef158335b whitespace
llvm-svn: 141356
2011-10-07 06:27:02 +00:00
Eli Friedman 1456cd20b4 Remove the old atomic instrinsics. autoupgrade functionality is included with this patch.
llvm-svn: 141333
2011-10-06 23:20:49 +00:00
Bill Wendling 267f323d28 Modify the mapping from landing pad to call sites to accept more than one call
site.

llvm-svn: 141226
2011-10-05 22:24:35 +00:00
Bill Wendling c2d55b6e50 Add an ivar that maps a landing pad's EH symbol to the call sites that may jump
to the landing pad. This will be used by the back-end to generate the jump
tables for dispatching the arriving longjmp in sjlj eh.

llvm-svn: 141224
2011-10-05 22:20:38 +00:00
Bill Wendling e61c62533e Small refactoring. Cache the FunctionInfo->MBB into a local variable.
llvm-svn: 141221
2011-10-05 22:16:11 +00:00
Jakob Stoklund Olesen eb38bd8ced Fix sub-register operand verification.
PhysReg operands are not allowed to have sub-register indices at all.

For virtual registers with sub-reg indices, check that all registers in
the register class support the sub-reg index.

llvm-svn: 141220
2011-10-05 22:12:57 +00:00
Bill Wendling db1633530a Fix comment to reflect the new EH stuff.
llvm-svn: 141218
2011-10-05 22:04:08 +00:00
Jakob Stoklund Olesen 3abead76ea Remove unused DstSubIdx argument.
llvm-svn: 141214
2011-10-05 21:22:53 +00:00
Jakob Stoklund Olesen f7957a9819 Simplify EXTRACT_SUBREG emission.
EXTRACT_SUBREG is emitted as %dst = COPY %src:sub, so there is no need to
constrain the %dst register class.  RegisterCoalescer will apply the
necessary constraints if it decides to eliminate the COPY.

The %src register class does need to be constrained to something with
the right sub-registers, though.  This is currently done manually with
COPY_TO_REGCLASS nodes.  They can possibly be removed after this patch.

llvm-svn: 141207
2011-10-05 20:26:40 +00:00
Jakob Stoklund Olesen 8ff52c4135 Simplify INSERT_SUBREG emission.
The register class created by INSERT_SUBREG and SUBREG_TO_REG must be
legal and support the SubIdx sub-registers.

The new getSubClassWithSubReg() hook can compute that.

This may create INSERT_SUBREG instructions defining a larger register
class than the sub-register being inserted.  That is OK,
RegisterCoalescer will constrain the register class as needed when it
eliminates the INSERT_SUBREG instructions.

llvm-svn: 141198
2011-10-05 18:31:00 +00:00
Jakob Stoklund Olesen ccdfbfb5e5 Add a FIXME.
TwoAddressInstructionPass should annotate instructions with <undef>
flags when it lower REG_SEQUENCE instructions.  LiveIntervals should not
be in the business of modifying code (except for kill flags, perhaps).

llvm-svn: 141187
2011-10-05 16:51:21 +00:00
Jakob Stoklund Olesen d5d39bb098 Also add <imp-use,kill> flags for redefined super-registers.
For example:

  %vreg10:dsub_0<def,undef> = COPY %vreg1
  %vreg10:dsub_1<def> = COPY %vreg2

is rewritten as:

  %D2<def> = COPY %D0, %Q1<imp-def>
  %D3<def> = COPY %D1, %Q1<imp-use,kill>, %Q1<imp-def>

The first COPY doesn't care about the previous value of %Q1, so it
doesn't read that register.

The second COPY is a partial redefinition of %Q1, so it implicitly kills
and redefines that register.

This makes it possible to recognize instructions that can harmlessly
clobber the full super-register.  The write and don't read the
super-register.

llvm-svn: 141139
2011-10-05 00:01:48 +00:00
Jakob Stoklund Olesen 9d5bda9be1 Also add <def,undef> flags when coalescing sub-registers.
RegisterCoalescer can create sub-register defs when it is joining a
register with a sub-register.  Add <undef> flags to these new
sub-register defs where appropriate.

llvm-svn: 141138
2011-10-05 00:01:46 +00:00
Owen Anderson 0ca562ec4c Teach the MC to output code/data region marker labels in MachO and ELF modes. These are used by disassemblers to provide better disassembly, particularly on targets like ARM Thumb that like to intermingle data in the TEXT segment.
llvm-svn: 141135
2011-10-04 23:26:17 +00:00
Bill Wendling 3d11aa7e75 Create a mapping between the landing pad basic block and the call site index for later use.
llvm-svn: 141125
2011-10-04 22:00:35 +00:00
Jakob Stoklund Olesen 10f2de3261 Allow <undef> flags on def operands as well as uses.
The <undef> flag says that a MachineOperand doesn't read its register,
or doesn't depend on the previous value of its register.

A full register def never depends on the previous register value.  A
partial register def may depend on the previous value if it is intended
to update part of a register.

For example:

  %vreg10:dsub_0<def,undef> = COPY %vreg1
  %vreg10:dsub_1<def> = COPY %vreg2

The first copy instruction defines the full %vreg10 register with the
bits not covered by dsub_0 defined as <undef>.  It is not considered a
read of %vreg10.

The second copy modifies part of %vreg10 while preserving the rest.  It
has an implicit read of %vreg10.

This patch adds a MachineOperand::readsReg() method to determine if an
operand reads its register.

Previously, this was modelled by adding a full-register <imp-def>
operand to the instruction.  This approach makes it possible to
determine directly from a MachineOperand if it reads its register.  No
scanning of MI operands is required.

llvm-svn: 141124
2011-10-04 21:49:33 +00:00
Bill Wendling ac3fb4c078 Generic cleanup.
llvm-svn: 141050
2011-10-04 00:16:40 +00:00
Bill Wendling 97a8695fff Don't carry over the dispatchsetup hack from the old system.
llvm-svn: 141040
2011-10-03 22:42:40 +00:00
Bill Wendling 6f3e73d6ad Move the grabbing of the jump buffer into the caller function, eliminating the need for returning a std::pair.
llvm-svn: 141026
2011-10-03 21:15:28 +00:00
Eric Christopher cead033ced Whitespace.
llvm-svn: 141005
2011-10-03 15:49:20 +00:00
Eric Christopher f84354bfb1 Typo.
llvm-svn: 141004
2011-10-03 15:49:16 +00:00
Nadav Rotem 52e8ed9214 Moved type construction out of the loop and added an assert on the legality of the type. Formatted lines to the 80 char limit.
llvm-svn: 140952
2011-10-01 18:39:28 +00:00
Bill Wendling 9925f197cc When inferring the pointer alignment, if the global doesn't have an initializer
and the alignment is 0 (i.e., it's defined globally in one file and declared in
another file) it could get an alignment which is larger than the ABI allows for
that type, resulting in aligned moves being used for unaligned loads.

For instance, in file A.c:

   struct S s;

In file B.c:
   struct {
     // something long
   };
   extern S s;

   void foo() {
     struct S p = s;
     // ...
   }

this copy is a 'memcpy' which is turned into a series of 'movaps' instructions
on X86. But this is wrong, because 'struct S' has alignment of 4, not 16.

llvm-svn: 140902
2011-09-30 23:19:55 +00:00
Nick Lewycky f40df1d46c Promote comment to doxycomment. Adjust whitespace. No functionality change.
llvm-svn: 140899
2011-09-30 22:19:53 +00:00
Jakob Stoklund Olesen 1352be2bd3 Move getCommonSubClass() into TRI.
It will soon need the context.

llvm-svn: 140896
2011-09-30 22:18:51 +00:00
Torok Edwin be5020eb95 Comment grammar fixes.
thanks to Duncan.

llvm-svn: 140850
2011-09-30 13:07:47 +00:00
Torok Edwin 319a1415b8 Instead of crashing when MCAsmInfo is NULL, add an assert.
This helps with porting code from 2.9 to 3.0 as TargetSelect.h changed location,
and if you include the old one by accident you will trigger this assert.

llvm-svn: 140848
2011-09-30 12:31:57 +00:00
Eli Friedman 95031ed837 Clean up uses of switch instructions so they are not dependent on the operand ordering. Patch by Stepan Dyatkovskiy.
llvm-svn: 140803
2011-09-29 20:21:17 +00:00
Duncan Sands cac86805bf Place this bracket according to the LLVM style.
llvm-svn: 140784
2011-09-29 16:01:46 +00:00
Jakob Stoklund Olesen 463b05a2d0 Remove NumImplicitOps which is now unused.
llvm-svn: 140767
2011-09-29 01:47:36 +00:00
Eric Christopher d299dccf91 Use the local we already set up.
llvm-svn: 140745
2011-09-29 00:50:59 +00:00
Jakob Stoklund Olesen 2318d1e0e9 Rewrite MachineInstr::addOperand() to avoid NumImplicitOps.
The function needs to scan the implicit operands anyway, so no
performance is won by caching the number of implicit operands added to
an instruction.

This also fixes a bug when adding operands after an implicit operand has
been added manually.  The NumImplicitOps count wasn't kept up to date.

MachineInstr::addOperand() will now consistently place all explicit
operands before all the implicit operands, regardless of the order they
are added.  It is possible to change an MI opcode and add additional
explicit operands.  They will be inserted before any existing implicit
operands.

The only exception is inline asm instructions where operands are never
reordered.  This is because of a hack that marks explicit clobber regs
on inline asm as <implicit-def> to please the fast register allocator.
This hack can go away when InstrEmitter and FastIsel can add exact
<dead> flags to physreg defs.

llvm-svn: 140744
2011-09-29 00:40:51 +00:00
Bill Wendling 899da52d60 Have the SjLjEHPrepare pass do some more heavy lifting.
Upon further review, most of the EH code should remain written at the IR
level. The part which breaks SSA form is the dispatch table, so that part will
be moved to the back-end.

llvm-svn: 140730
2011-09-28 21:56:53 +00:00
Duncan Sands 2e67937f76 A typeid of zero means a cleanup, not a catch. This case occurs
when there is both a catch and a cleanup.  Correct the comment.

llvm-svn: 140686
2011-09-28 09:13:02 +00:00
Bill Wendling baf3941fde Strip off pointer casts when looking at the eh.sjlj.functioncontext's argument.
llvm-svn: 140678
2011-09-28 03:52:41 +00:00
Bill Wendling 225e8481b0 Bitcast the alloca to an i8* to match the intrinsic's signature.
llvm-svn: 140677
2011-09-28 03:47:11 +00:00
Bill Wendling 66b110f571 Create and use an llvm.eh.sjlj.functioncontext intrinsic.
This intrinsic is used to pass the index of the function context to the back-end
for further processing. The back-end is in charge of filling in the rest of the
entries.

llvm-svn: 140676
2011-09-28 03:36:43 +00:00
Bill Wendling 2e76ca9d9a In the new EH model, setup the function context and the call site info.
The DWARF exception pass uses the call site information, which is set up here. A
pre-RA pass is too late for it to use this information. So create and setup the
function context here, and then insert the call site values here (and map the
call sites for the DWARF EH pass). This is simpler than the original pass, and
doesn't make the CFG lose its SSA-ness.

It's a win-win-win-win-lose-win-win situation.

llvm-svn: 140675
2011-09-28 03:14:05 +00:00
Bill Wendling e6138e3ad1 Don't conditionalize execution of the SjLj EH prepare pass.
We may need an SjLj EH preparation pass for some call site information, at least
in the short term.

llvm-svn: 140674
2011-09-28 03:07:34 +00:00
Jakob Stoklund Olesen bd5109f14d Rename class and clean up source.
No functional change intended.

llvm-svn: 140664
2011-09-28 00:01:56 +00:00
Jakob Stoklund Olesen 934b7d7645 Rename SSEDomainFix -> lib/CodeGen/ExecutionDepsFix.
I'll clean up the source in the next commit.

llvm-svn: 140663
2011-09-28 00:01:54 +00:00
Bill Wendling 354ff9e348 This is the start of the new SjLj EH preparation pass, which will replace the
current IR-level pass.

The old SjLj EH pass has some problems, especially with the new EH model. Most
significantly, it violates some of the new restrictions the new model has. For
instance, the 'dispatch' table wants to jump to the landing pad, but we cannot
allow that because only an invoke's unwind edge can jump to a landing pad. This
requires us to mangle the code something awful. In addition, we need to keep the
now dead landingpad instructions around instead of CSE'ing them because the
DWARF emitter uses that information (they are dead because no control flow edge
will execute them - the control flow edge from an invoke's unwind is superceded
by the edge coming from the dispatch).

Basically, this pass belongs not at the IR level where SSA is king, but at the
code-gen level, where we have more flexibility.

llvm-svn: 140646
2011-09-27 22:14:12 +00:00
Cameron Zwarich 7a6e8f2c5d Remove an invalid assert that is really just asserting when the scheduler emits
a suboptimal schedule.

llvm-svn: 140643
2011-09-27 21:59:16 +00:00
Jim Grosbach af136f71ec Rename AddSelectionDAGCSEId() to addSelectionDAGCSEId().
Naming conventions consistency. No functional change.

llvm-svn: 140636
2011-09-27 20:59:33 +00:00
Nadav Rotem 38b3b83362 Cleanup PromoteIntOp_EXTRACT_VECTOR_ELT and PromoteIntRes_SETCC.
Add a new method: getAnyExtOrTrunc and use it to replace the manual check.

llvm-svn: 140603
2011-09-27 11:16:47 +00:00
Nadav Rotem 1b857d2762 Revert r140463; The patch assumes that <4 x i1> is saved to memory as 4 x i8,
while the decision is to bit-pack small values.

llvm-svn: 140601
2011-09-27 10:48:29 +00:00
James Molloy 0ceb8cadd2 Fix emission of debug data for global variables. getContext() on DIGlobalVariables is not valid any more.
llvm-svn: 140539
2011-09-26 17:40:42 +00:00
Jakob Stoklund Olesen df977fedb6 Add target hook for pseudo instruction expansion.
Many targets use pseudo instructions to help register allocation.  Like
the COPY instruction, these pseudos can be expanded after register
allocation.  The early expansion can make life easier for PEI and the
post-ra scheduler.

This patch adds a hook that is called for all remaining pseudo
instructions from the ExpandPostRAPseudos pass.

llvm-svn: 140472
2011-09-25 19:21:35 +00:00
Nadav Rotem 2279949129 [vector-select] Address one of the issues in pr10902. EXTRACT_VECTOR_ELEMENT
SDNodes may return values which are wider than the incoming element types. In
this patch we fix the integer promotion of these nodes.

Fixes spill-q.ll when running -promote-elements.

llvm-svn: 140471
2011-09-25 18:59:42 +00:00
Jakob Stoklund Olesen fd719d184e Clean up code after renaming LowerSubregs -> ExpandPostRAPseudos.
No functional change intended.

llvm-svn: 140470
2011-09-25 16:46:08 +00:00
Jakob Stoklund Olesen f152df1e6b Rename LowerSubregs to ExpandPostRAPseudos.
I'll fix the file contents in the next commit.

This pass is currently expanding the COPY and SUBREG_TO_REG pseudos. I
am going to add a hook so targets can expand more pseudo-instructions
after register allocation.

Many targets have pseudo-instructions that assist the register
allocator.  They can be expanded after register allocation, before PEI
and PostRA scheduling.

llvm-svn: 140469
2011-09-25 16:46:00 +00:00
Nadav Rotem c2deabd202 Implement Duncan's suggestion to use the result of getSetCCResultType if it is legal
(this is always the case for scalars), otherwise use the promoted result type.

Fix test/CodeGen/X86/vsplit-and.ll when promote-elements is enabled.

llvm-svn: 140464
2011-09-24 19:48:19 +00:00
Nadav Rotem 77426a754b [Vector-Select] Address one of the problems in 10902.
When generating the trunc-store of i1's, we need to use the vector type and not
the scalar type.

This patch fixes the assertion in CodeGen/Generic/bool-vector.ll when
running with -promote-elements.

llvm-svn: 140463
2011-09-24 18:32:19 +00:00
Jakob Stoklund Olesen 3bb99bc957 Verify that terminators follow non-terminators.
This exposes a -segmented-stacks bug.

llvm-svn: 140429
2011-09-23 22:45:39 +00:00
Eli Friedman 8a15a5aa93 PR10998: It is not legal to sink an instruction past the terminator of a block; make sure we don't do that.
llvm-svn: 140428
2011-09-23 22:41:57 +00:00