The order in which branches appear in ImmBranches is approximately their
order within the function body. By visiting later branches first, we reduce
the distance between earlier forward branches and their targets, making it
more likely that the cbn?z optimization, which can only apply to forward
branches, will succeed for those earlier branches.
Differential Revision: http://reviews.llvm.org/D9185
llvm-svn: 235640
This appears to have been introduced back in r76698 as part of an unrelated
change. I can find no official ARM documentation stating that Thumb-2 functions
require 4-byte alignment; in fact, ARM documentation appears to contradict
this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement,
section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions.").
Also remove code that sets alignment for ARM functions, which is redundant
with code in the MachineFunction constructor, and remove the hidden
-arm-align-constant-islands flag, which has been enabled by default since
r146739 (Dec 2011) and has probably received sufficient testing by now.
Differential Revision: http://reviews.llvm.org/D9138
llvm-svn: 235636
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
The assert was being triggered when the distance between a constant pool entry
and its user exceeded the maximally allowed distance after thumb2 branch
shortening. A padding was inserted after a thumb2 branch instruction was shrunk,
which caused the user to be out of range. This is wrong as the padding should
have been inserted by the layout algorithm so that the distance between two
instructions doesn't grow later during thumb2 instruction optimization.
This commit fixes the code in ARMConstantIslands::createNewWater to call
computeBlockSize and set BasicBlock::Unalign when a branch instruction is
inserted to create new water after a basic block. A non-zero Unalign causes
the worst-case padding to be inserted when adjustBBOffsetsAfter is called to
recompute the basic block offsets.
rdar://problem/19130476
llvm-svn: 225467
Normally entries can only move to a lower address, but when that wasn't viable,
the user's block was considered anyway. Unfortunately, it went via
createNewWater which wasn't designed to handle the case where there's already
an island after the block.
Unfortunately, the test we have is slow and fragile, and I couldn't reduce it
to anything sane even with the @llvm.arm.space intrinsic. The test change here
is recreating the previous one after the change.
rdar://problem/18545506
llvm-svn: 221905
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.
llvm-svn: 221904
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:
// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}
The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.
This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.
rdar://problem/18581150
llvm-svn: 220015
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
The ARM backend did not expect LDRBi12 to hold a constant pool operand.
Allow for LLVM to deal with the instruction similar to how it deals with
LDRi12.
This fixes PR16215.
llvm-svn: 183238
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
This was exposed by SingleSource/UnitTests/Vector/constpool.c.
The computed size of a basic block isn't always a multiple of its known
alignment, and that can introduce extra alignment padding after the
block.
<rdar://problem/11347135>
llvm-svn: 155845
The code could search past the end of the basic block when there was
already a constant pool entry after the block.
Test case with giant basic block in SingleSource/UnitTests/Vector/constpool.c
llvm-svn: 155753
Previously, ARMConstantIslandPass would conservatively compute the
address of an aligned basic block as:
RoundUpToAlignment(Offset + UnknownPadding)
This worked fine for the layout algorithm itself, but it could fool the
verify() function because it accounts for alignment padding twice: Once
when adding the worst case UnknownPadding, and again by rounding up the
fictional block offset. This meant that when optimizeThumb2Instructions
would shrink an instruction, the conservative distance estimate could
grow. That shouldn't be possible since the woorst case alignment padding
wss already included.
This patch drops the use of RoundUpToAlignment, and depends only on
worst case padding to compute conservative block offsets. This has the
weird effect that the computed offset for an aligned block may not be
aligned.
The important difference is that shrinking an instruction can never
cause the estimated distance between two instructions to grow. The
estimated distance is always larger than the real distance that only the
assembler knows.
<rdar://problem/11339352>
llvm-svn: 155744
ARMConstantIslandPass still has bugs where jump table compression can
cause constant pool entries to go out of range.
Add a safety margin of 2 bytes when placing constant islands, but use
the real max displacement for verification.
<rdar://problem/11156595>
llvm-svn: 153789
This pass splits basic blocks to insert constant islands, and it
doesn't recompute the live-in lists. No later passes depend on accurate
liveness information.
This fixes PR12410 where the machine code verifier was complaining.
llvm-svn: 153700
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms. This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.
Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.
Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.
Add a single large test case that will hopefully exercise many parts of
the constant island pass.
<rdar://problem/10670199>
llvm-svn: 147885
On Thumb, the displacement computation hardware uses the address of the
current instruction rouned down to a multiple of 4. Include this
rounding in the UserOffset we compute for each instruction.
When inline asm is present, the instruction alignment may not be known.
Constrain the maximum displacement instead in that case.
This makes it possible for CreateNewWater() and OffsetIsInRange() to
agree about the valid displacements. When they disagree, infinite
looping happens.
As always, test cases for this stuff are insane.
<rdar://problem/10660175>
llvm-svn: 147825
This adjustment is already included in the block offsets computed by
BasicBlockInfo, and adjusting again here can cause the pass to loop.
When CreateNewWater splits a basic block, OffsetIsInRange would reject
the new CPE on the next pass because of the too conservative alignment
adjustment. This caused the block to be split again, and so on.
llvm-svn: 146751
The command line option should be removed, but not until the feature has
gotten a lot of testing. The ARMConstantIslandPass tends to have subtle
bugs that only show up after a while.
llvm-svn: 146739
An aligned constant pool entry may require extra alignment padding where
the new water is created. Take that into account when computing offset.
Also consider the alignment of other constant pool entries when
splitting a basic block. Alignment padding may make it necessary to
move the split point higher.
llvm-svn: 146609
Constant pool entries with different alignment may cause more alignment
padding to be inserted. Compute the amount of padding needed, and try to
pick the location that requires the least amount of padding.
Also take the extra padding into account when the water is above the
use.
llvm-svn: 146458
Downgrade the alignment of the initial constant island when constant
pool entries are moved elsewhere.
This is all gated by -arm-align-constant-islands.
llvm-svn: 146391
Order constant pool entries by descending alignment in the initial
island to ensure packing and correct alignment. When the command line
flag is set, also align the basic block containing the constant pool
entries.
This is only a partial implementation of constant island alignment. More
to come.
llvm-svn: 146375
The split point is picked such that the newly created water has the same
alignment as the function. This makes the island suitable for constant
pool entries with potentially higher alignment.
This also fixes an issue where the basic block was split one instruction
too late, causing nonconvergence of the algorithm.
<rdar://problem/10550705>
There is still an issue with correctly packing differently aligned
entries in the island.
llvm-svn: 146314
It is not used any more. We are tracking inline assembly misalignments
directly through the BBInfo.Unalign and KnownBits fields.
A simple conservative size estimate is not good enough since it can
cause alignment padding to be underestimated.
llvm-svn: 146124
Compute alignment padding before and after basic blocks dynamically.
Heed basic block alignment.
This simplifies bookkeeping because we don't have to constantly add and
remove padding from BBInfo.Size. It also makes it possible to track the
extra known alignment bits we get after a tBR_JTr terminator and when
entering an aligned basic block.
This makes the ARMConstantIslandPass aware of aligned basic blocks.
It is tricky to model block alignment correctly when dealing with inline
assembly and tBR_JTr instructions that have variable size. If inline
assembly turns out to be smaller than expected, that may cause following
alignment padding to be larger than expected. This could cause constant
pool entries to move out of range.
To avoid that problem, we use the worst case alignment padding following
inline assembly. This may cause slightly suboptimal constant island
placement in aligned basic blocks following inline assembly. Normal
functions should be unaffected.
llvm-svn: 146118
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
The block offset can be computed from the previous block. That is more
robust than keeping track of a delta.
Eliminate one redundant AdjustBBOffsetsAfter call.
llvm-svn: 146018
This pseudo-instruction contains a .align directive in its expansion, so
the total size may vary by 2 bytes.
It is too difficult to accurately keep track of this alignment
directive, just use the worst-case size instead.
llvm-svn: 145971
ARMConstantIslandPass may sometimes leave empty constant islands behind
(it really shouldn't). Remove the alignment from the empty islands so
the size calculations are still correct.
This should fix the many Thumb1 assembler errors in the nightly test
suite.
The reduced test case for this problem is way too big. That is to be
expected for ARMConstantIslandPass bugs.
<rdar://problem/10534709>
llvm-svn: 145970
Previously, all ARM::CONSTPOOL_ENTRY instructions had a hardwired
alignment of 4 bytes emitted by ARMAsmPrinter. Now the same alignment
is set on the basic block.
This is in preparation of supporting ARM constant pool islands with
different alignments.
llvm-svn: 145890
Original Log: Get rid of the separate opcodes for the Darwin versions of tBL, tBLXi, and tBLXr, using pseudo-instructions to lower to the single final opcode. Update the ARM disassembler for this change.
llvm-svn: 135414
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.
llvm-svn: 134746
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
t2LDRpci with t2LDRi12.
There are a couple of problems with this.
1. The encoding for the literal and immediate constant are different.
Note bit 7 of the literal case is 'U' so it can be negative.
2. t2LDRi12 is now narrowed to tLDRpci before constant island pass is run.
So we end up never using the Thumb2 instruction, which ends up creating a
lot more constant islands.
llvm-svn: 125074
movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
add r0, pc, r0
It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.
llvm-svn: 123619
explicit about the operands. Split out the different variants into separate
instructions. This gives us the ability to, among other things, assign
different scheduling itineraries to the variants. rdar://8477752.
llvm-svn: 117409
comments explaining why it was wrong. 8225024.
Fix the real problem in 8213383: the code that splits very large
blocks when no other place to put constants can be found was not
considering the case that the block contained a Thumb tablejump.
llvm-svn: 109282
ARM/PPC/MSP430-specific code (which are the only targets that
implement the hook) can directly reference their target-specific
instrinfo classes.
llvm-svn: 109171
mov pc, r1
.align 2
LJTI0_0_0:
.long LBB0_14
This fixes rdar://8213383. No test case since it's not possible to come up with a suitable small one.
llvm-svn: 109076
address calculation instructions leading up to a jump table when we're trying
to convert them into a TB[H] instruction in Thumb2. This realistically
shouldn't happen much, if at all, for well formed inputs, but it's more correct
to handle it. rdar://7387682
llvm-svn: 107830
writebacks to the address register. This gets rid of the hack that the
first register on the list was the magic writeback register operand. There
was an implicit constraint that if that operand was not reg0 it had to match
the base register operand. The post-RA scheduler's antidependency breaker
did not understand that constraint and sometimes changed one without the
other. This also fixes Radar 7495976 and should help the verifier work
better for ARM code.
There are now new ld/st instructions explicit writeback operands and explicit
constraints that tie those registers together.
llvm-svn: 98409
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
constant pool ranges, as CPEIsInRange() makes conservative assumptions about
the potential alignment changes from branch adjustments. The verification,
on the other hand, runs after those branch adjustments are made, so the
effects on alignment are known and already taken into account. The sanity
check in verify should check the range directly instead.
llvm-svn: 89473
assembly can confuse things utterly, as it's assumed that instructions in
inline assembly are 4 bytes wide. For Thumb mode, that's often not true,
so the calculations for when alignment padding will be present get thrown off,
ultimately leading to out of range constant pool entry references. Making
more conservative assumptions that padding may be necessary when inline asm
is present avoids this situation.
llvm-svn: 89403
can only branch forward. To best take advantage of them, we'd like to adjust
the basic blocks around a bit when reasonable. This patch puts basics in place
to do that, with a super-simple algorithm for backwards jump table targets that
creates a new branch after the jump table which branches backwards. Real
heuristics for reordering blocks or other modifications rather than inserting
branches will follow.
llvm-svn: 86791
In the case where there are no good places to put constants and we fall back
upon inserting unconditional branches to make new blocks, allow all constant
pool references in range of those blocks to put constants there, even if that
means resetting the "high water marks" for those references. This will still
terminate because you can't keep splitting blocks forever, and in the bad
cases where we have to split blocks, it is important to avoid splitting more
than necessary.
llvm-svn: 84202
When ARMConstantIslandPass cannot find any good locations (i.e., "water") to
place constants, it falls back to inserting unconditional branches to make a
place to put them. My recent change exposed a problem in this area. We may
sometimes append to the same block more than one unconditional branch. The
symptoms of this are that the generated assembly has a branch to an undefined
label and running llc with -debug will cause a seg fault.
This happens more easily since my change to prevent CPEs from moving from
lower to higher addresses as the algorithm iterates, but it could have
happened before. The end of the block may be in range for various constant
pool references, but the insertion point for new CPEs is not right at the end
of the block -- it is at the end of the CPEs that have already been placed
at the end of the block. The insertion point could be out of range. When
that happens, the fallback code will always append another unconditional
branch if the end of the block is in range.
The fix is to only append an unconditional branch if the block does not
already end with one. I also removed a check to see if the constant pool load
instruction is at the end of the block, since that is redundant with
checking if the end of the block is in-range.
There is more to be done here, but I think this fixes the immediate problem.
llvm-svn: 84172
before its reference is only supported on ARM has not been true for a while.
In fact, until recently, that was only supported for Thumb. Besides that,
CPEs are always a multiple of 4 bytes in size, so inserting a CPE should have
no effect on Thumb alignment.
llvm-svn: 83916
MultiSource/Benchmarks/MiBench/automotive-susan test. The failure has
since been masked by an unrelated change (just randomly), so I don't have
a testcase for this now. Radar 7291928.
The situation where this happened is that a constant pool entry (CPE) was
placed at a lower address than the load that referenced it. There were in
fact 2 CPEs placed at adjacent addresses and referenced by 2 loads that were
close together in the code. The distance from the loads to the CPEs was
right at the limit of what they could handle, so that only one of the CPEs
could be placed within range. On every iteration, the first CPE was found
to be out of range, causing a new CPE to be inserted. The second CPE had
been in range but the newly inserted entry pushed it too far away. Thus the
second CPE was also replaced by a new entry, which in turn pushed the first
CPE out of range. Etc.
Judging from some comments in the code, the initial implementation of this
pass did not support CPEs placed _before_ their references. In the case
where the CPE is placed at a higher address, the key to making the algorithm
terminate is that new CPEs are only inserted at the end of a group of adjacent
CPEs. This is implemented by removing a basic block from the "WaterList"
once it has been used, and then adding the newly inserted CPE block to the
list so that the next insertion will come after it. This avoids the ping-pong
effect where CPEs are repeatedly moved to the beginning of a group of
adjacent CPEs. This does not work when going backwards, however, because the
entries at the end of an adjacent group of CPEs are closer than the CPEs
earlier in the group.
To make this pass terminate, we need to maintain a property that changes can
only happen in some sort of monotonic fashion. The fix used here is to require
that the CPE for a particular constant pool load can only move to lower
addresses. This is a very simple change to the code and should not cause
any significant degradation in the results.
llvm-svn: 83902
MachineInstr and MachineOperand. This required eliminating a
bunch of stuff that was using DOUT, I hope that bill doesn't
mind me stealing his fun. ;-)
llvm-svn: 79813