live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms. This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.
Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.
Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.
Add a single large test case that will hopefully exercise many parts of
the constant island pass.
<rdar://problem/10670199>
llvm-svn: 147885
On Thumb, the displacement computation hardware uses the address of the
current instruction rouned down to a multiple of 4. Include this
rounding in the UserOffset we compute for each instruction.
When inline asm is present, the instruction alignment may not be known.
Constrain the maximum displacement instead in that case.
This makes it possible for CreateNewWater() and OffsetIsInRange() to
agree about the valid displacements. When they disagree, infinite
looping happens.
As always, test cases for this stuff are insane.
<rdar://problem/10660175>
llvm-svn: 147825
This adjustment is already included in the block offsets computed by
BasicBlockInfo, and adjusting again here can cause the pass to loop.
When CreateNewWater splits a basic block, OffsetIsInRange would reject
the new CPE on the next pass because of the too conservative alignment
adjustment. This caused the block to be split again, and so on.
llvm-svn: 146751
The command line option should be removed, but not until the feature has
gotten a lot of testing. The ARMConstantIslandPass tends to have subtle
bugs that only show up after a while.
llvm-svn: 146739
An aligned constant pool entry may require extra alignment padding where
the new water is created. Take that into account when computing offset.
Also consider the alignment of other constant pool entries when
splitting a basic block. Alignment padding may make it necessary to
move the split point higher.
llvm-svn: 146609
Constant pool entries with different alignment may cause more alignment
padding to be inserted. Compute the amount of padding needed, and try to
pick the location that requires the least amount of padding.
Also take the extra padding into account when the water is above the
use.
llvm-svn: 146458
Downgrade the alignment of the initial constant island when constant
pool entries are moved elsewhere.
This is all gated by -arm-align-constant-islands.
llvm-svn: 146391
Order constant pool entries by descending alignment in the initial
island to ensure packing and correct alignment. When the command line
flag is set, also align the basic block containing the constant pool
entries.
This is only a partial implementation of constant island alignment. More
to come.
llvm-svn: 146375
The split point is picked such that the newly created water has the same
alignment as the function. This makes the island suitable for constant
pool entries with potentially higher alignment.
This also fixes an issue where the basic block was split one instruction
too late, causing nonconvergence of the algorithm.
<rdar://problem/10550705>
There is still an issue with correctly packing differently aligned
entries in the island.
llvm-svn: 146314
It is not used any more. We are tracking inline assembly misalignments
directly through the BBInfo.Unalign and KnownBits fields.
A simple conservative size estimate is not good enough since it can
cause alignment padding to be underestimated.
llvm-svn: 146124
Compute alignment padding before and after basic blocks dynamically.
Heed basic block alignment.
This simplifies bookkeeping because we don't have to constantly add and
remove padding from BBInfo.Size. It also makes it possible to track the
extra known alignment bits we get after a tBR_JTr terminator and when
entering an aligned basic block.
This makes the ARMConstantIslandPass aware of aligned basic blocks.
It is tricky to model block alignment correctly when dealing with inline
assembly and tBR_JTr instructions that have variable size. If inline
assembly turns out to be smaller than expected, that may cause following
alignment padding to be larger than expected. This could cause constant
pool entries to move out of range.
To avoid that problem, we use the worst case alignment padding following
inline assembly. This may cause slightly suboptimal constant island
placement in aligned basic blocks following inline assembly. Normal
functions should be unaffected.
llvm-svn: 146118
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
The block offset can be computed from the previous block. That is more
robust than keeping track of a delta.
Eliminate one redundant AdjustBBOffsetsAfter call.
llvm-svn: 146018
This pseudo-instruction contains a .align directive in its expansion, so
the total size may vary by 2 bytes.
It is too difficult to accurately keep track of this alignment
directive, just use the worst-case size instead.
llvm-svn: 145971
ARMConstantIslandPass may sometimes leave empty constant islands behind
(it really shouldn't). Remove the alignment from the empty islands so
the size calculations are still correct.
This should fix the many Thumb1 assembler errors in the nightly test
suite.
The reduced test case for this problem is way too big. That is to be
expected for ARMConstantIslandPass bugs.
<rdar://problem/10534709>
llvm-svn: 145970
Previously, all ARM::CONSTPOOL_ENTRY instructions had a hardwired
alignment of 4 bytes emitted by ARMAsmPrinter. Now the same alignment
is set on the basic block.
This is in preparation of supporting ARM constant pool islands with
different alignments.
llvm-svn: 145890
Original Log: Get rid of the separate opcodes for the Darwin versions of tBL, tBLXi, and tBLXr, using pseudo-instructions to lower to the single final opcode. Update the ARM disassembler for this change.
llvm-svn: 135414
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.
llvm-svn: 134746
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
t2LDRpci with t2LDRi12.
There are a couple of problems with this.
1. The encoding for the literal and immediate constant are different.
Note bit 7 of the literal case is 'U' so it can be negative.
2. t2LDRi12 is now narrowed to tLDRpci before constant island pass is run.
So we end up never using the Thumb2 instruction, which ends up creating a
lot more constant islands.
llvm-svn: 125074
movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
add r0, pc, r0
It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.
llvm-svn: 123619
explicit about the operands. Split out the different variants into separate
instructions. This gives us the ability to, among other things, assign
different scheduling itineraries to the variants. rdar://8477752.
llvm-svn: 117409
comments explaining why it was wrong. 8225024.
Fix the real problem in 8213383: the code that splits very large
blocks when no other place to put constants can be found was not
considering the case that the block contained a Thumb tablejump.
llvm-svn: 109282
ARM/PPC/MSP430-specific code (which are the only targets that
implement the hook) can directly reference their target-specific
instrinfo classes.
llvm-svn: 109171
mov pc, r1
.align 2
LJTI0_0_0:
.long LBB0_14
This fixes rdar://8213383. No test case since it's not possible to come up with a suitable small one.
llvm-svn: 109076
address calculation instructions leading up to a jump table when we're trying
to convert them into a TB[H] instruction in Thumb2. This realistically
shouldn't happen much, if at all, for well formed inputs, but it's more correct
to handle it. rdar://7387682
llvm-svn: 107830
writebacks to the address register. This gets rid of the hack that the
first register on the list was the magic writeback register operand. There
was an implicit constraint that if that operand was not reg0 it had to match
the base register operand. The post-RA scheduler's antidependency breaker
did not understand that constraint and sometimes changed one without the
other. This also fixes Radar 7495976 and should help the verifier work
better for ARM code.
There are now new ld/st instructions explicit writeback operands and explicit
constraints that tie those registers together.
llvm-svn: 98409
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
constant pool ranges, as CPEIsInRange() makes conservative assumptions about
the potential alignment changes from branch adjustments. The verification,
on the other hand, runs after those branch adjustments are made, so the
effects on alignment are known and already taken into account. The sanity
check in verify should check the range directly instead.
llvm-svn: 89473
assembly can confuse things utterly, as it's assumed that instructions in
inline assembly are 4 bytes wide. For Thumb mode, that's often not true,
so the calculations for when alignment padding will be present get thrown off,
ultimately leading to out of range constant pool entry references. Making
more conservative assumptions that padding may be necessary when inline asm
is present avoids this situation.
llvm-svn: 89403
can only branch forward. To best take advantage of them, we'd like to adjust
the basic blocks around a bit when reasonable. This patch puts basics in place
to do that, with a super-simple algorithm for backwards jump table targets that
creates a new branch after the jump table which branches backwards. Real
heuristics for reordering blocks or other modifications rather than inserting
branches will follow.
llvm-svn: 86791
In the case where there are no good places to put constants and we fall back
upon inserting unconditional branches to make new blocks, allow all constant
pool references in range of those blocks to put constants there, even if that
means resetting the "high water marks" for those references. This will still
terminate because you can't keep splitting blocks forever, and in the bad
cases where we have to split blocks, it is important to avoid splitting more
than necessary.
llvm-svn: 84202
When ARMConstantIslandPass cannot find any good locations (i.e., "water") to
place constants, it falls back to inserting unconditional branches to make a
place to put them. My recent change exposed a problem in this area. We may
sometimes append to the same block more than one unconditional branch. The
symptoms of this are that the generated assembly has a branch to an undefined
label and running llc with -debug will cause a seg fault.
This happens more easily since my change to prevent CPEs from moving from
lower to higher addresses as the algorithm iterates, but it could have
happened before. The end of the block may be in range for various constant
pool references, but the insertion point for new CPEs is not right at the end
of the block -- it is at the end of the CPEs that have already been placed
at the end of the block. The insertion point could be out of range. When
that happens, the fallback code will always append another unconditional
branch if the end of the block is in range.
The fix is to only append an unconditional branch if the block does not
already end with one. I also removed a check to see if the constant pool load
instruction is at the end of the block, since that is redundant with
checking if the end of the block is in-range.
There is more to be done here, but I think this fixes the immediate problem.
llvm-svn: 84172
before its reference is only supported on ARM has not been true for a while.
In fact, until recently, that was only supported for Thumb. Besides that,
CPEs are always a multiple of 4 bytes in size, so inserting a CPE should have
no effect on Thumb alignment.
llvm-svn: 83916
MultiSource/Benchmarks/MiBench/automotive-susan test. The failure has
since been masked by an unrelated change (just randomly), so I don't have
a testcase for this now. Radar 7291928.
The situation where this happened is that a constant pool entry (CPE) was
placed at a lower address than the load that referenced it. There were in
fact 2 CPEs placed at adjacent addresses and referenced by 2 loads that were
close together in the code. The distance from the loads to the CPEs was
right at the limit of what they could handle, so that only one of the CPEs
could be placed within range. On every iteration, the first CPE was found
to be out of range, causing a new CPE to be inserted. The second CPE had
been in range but the newly inserted entry pushed it too far away. Thus the
second CPE was also replaced by a new entry, which in turn pushed the first
CPE out of range. Etc.
Judging from some comments in the code, the initial implementation of this
pass did not support CPEs placed _before_ their references. In the case
where the CPE is placed at a higher address, the key to making the algorithm
terminate is that new CPEs are only inserted at the end of a group of adjacent
CPEs. This is implemented by removing a basic block from the "WaterList"
once it has been used, and then adding the newly inserted CPE block to the
list so that the next insertion will come after it. This avoids the ping-pong
effect where CPEs are repeatedly moved to the beginning of a group of
adjacent CPEs. This does not work when going backwards, however, because the
entries at the end of an adjacent group of CPEs are closer than the CPEs
earlier in the group.
To make this pass terminate, we need to maintain a property that changes can
only happen in some sort of monotonic fashion. The fix used here is to require
that the CPE for a particular constant pool load can only move to lower
addresses. This is a very simple change to the code and should not cause
any significant degradation in the results.
llvm-svn: 83902
MachineInstr and MachineOperand. This required eliminating a
bunch of stuff that was using DOUT, I hope that bill doesn't
mind me stealing his fun. ;-)
llvm-svn: 79813
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
llvm-svn: 77024
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").
llvm-svn: 75640
Make llvm_unreachable take an optional string, thus moving the cerr<< out of
line.
LLVM_UNREACHABLE is now a simple wrapper that makes the message go away for
NDEBUG builds.
llvm-svn: 75379
After much back and forth, I decided to deviate from ARM design and split LDR into 4 instructions (r + imm12, r + imm8, r + r << imm12, constantpool). The advantage of this is 1) it follows the latest ARM technical manual, and 2) makes it easier to reduce the width of the instruction later. The down side is this creates more inconsistency between the two sub-targets. We should split ARM LDR instruction in a similar fashion later. I've added a README entry for this.
llvm-svn: 74420