The remaining copyRegToReg calls actually check the return value (shock!), so we
cannot trivially replace them with COPY instructions.
llvm-svn: 108069
if a block is split (by a custom inserter), the insert point may be in a
different block than it was originally. This fixes 32-bit llvm-gcc
bootstrap builds, and I haven't been able to reproduce it otherwise.
llvm-svn: 108060
operation, but the way it's implemented requires the operation to also be
commutative. So add a check for commutativity (and tweak the corresponding
comments). This makes no difference in practice since every associative
LLVM instruction is also commutative! Here's an example to show the need
for commutativity: the accum_recursion.ll testcase calculates the factorial
function. Before the transformation the result of a call is
((((1*1)*2)*3)...)*x
while afterwards it is
(((1*x)*(x-1))...*2)*1
which clearly requires both associativity and commutativity of * to be equal
to the original.
llvm-svn: 108056
ScheduleDAGEmit, TwoAddressLowering, and PHIElimination.
This switches the bulk of register copies to using COPY, but many less used
copyRegToReg calls remain.
llvm-svn: 108050
Based on a patch by Rafael Espíndola.
Attempt to make the FpSET_ST1 hack more robust, but we are still relying on
FpSET_ST0 preceeding it. This is only for supporting really weird x87 inline
asm.
We support:
FpSET_ST0
INLINEASM
FpSET_ST0
FpSET_ST1
INLINEASM
with and without kills on the arguments. We don't support:
FpSET_ST1
FpSET_ST0
INLINEASM
nor
FpSET_ST1
INLINEASM
Just Don't Do It!
llvm-svn: 108047
- Check getBytesToPopOnReturn().
- Eschew ST0 and ST1 for return values.
- Fix the PIC base register initialization so that it doesn't ever
fail to end up the top of the entry block.
llvm-svn: 108039
it is popped, even if it is ununsed. A CopyFromReg node is too weak to represent
the required sideeffect, so insert an FpGET_ST0 instruction directly instead.
This will matter when CopyFromReg gets lowered to a generic COPY instruction.
llvm-svn: 108037
notes:
- The instructions are being added with dummy placeholder patterns using some 256
specifiers, this is not meant to work now, but since there are some multiclasses
generic enough to accept them, when we go for codegen, the stuff will be already
there.
- Add VEX encoding bits to support YMM
- Add MOVUPS and MOVAPS in the first round
- Use "Y" as suffix for those Instructions: MOVUPSYrr, ...
- All AVX instructions in X86InstrSSE.td will move soon to a new X86InstrAVX
file.
llvm-svn: 107996
inserted in a MBB, and return an already inserted MI.
This target API change is necessary to allow foldMemoryOperand to call
storeToStackSlot and loadFromStackSlot when folding a COPY to a stack slot
reference in a target independent way.
The foldMemoryOperandImpl hook is going to change in the same way, but I'll wait
until COPY folding is actually implemented. Most targets only fold copies and
won't need to specialize this hook at all.
llvm-svn: 107991
U utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U test/CodeGen/X86/fast-isel.ll
U test/CodeGen/X86/fast-isel-loads.ll
U include/llvm/Target/TargetLowering.h
U include/llvm/Support/PassNameParser.h
U include/llvm/CodeGen/FunctionLoweringInfo.h
U include/llvm/CodeGen/CallingConvLower.h
U include/llvm/CodeGen/FastISel.h
U include/llvm/CodeGen/SelectionDAGISel.h
U lib/CodeGen/LLVMTargetMachine.cpp
U lib/CodeGen/CallingConvLower.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U lib/CodeGen/SelectionDAG/FastISel.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U lib/CodeGen/SelectionDAG/TargetLowering.cpp
U lib/Target/XCore/XCoreISelLowering.cpp
U lib/Target/XCore/XCoreISelLowering.h
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86ISelLowering.h
llvm-svn: 107987
jumps where possible and turning the TAILCALL marker in the instruction
asm string into a proper comment.
This eliminates a FIXME and is on the path to finishing:
rdar://7639610 - eliminate encoding and asm info for TAILJMPd TAILJMPr TAILJMPn, etc.
However, I can't eliminate the encodings for these instructions because the JIT
still exists and has its own copy of the encoder, sigh.
llvm-svn: 107946
disabled and then never turned back on again. Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.
llvm-svn: 107941
like all other instructions, even though a segment is not
allowed. This resolves a bunch of gross hacks in the
encoder and makes LEA more consistent with the rest of the
instruction set.
No functionality change.
llvm-svn: 107934
in memory operands at the same type as hard coded segments.
This fixes problems where we'd emit the segment override after
the REX prefix on instructions like:
mov %gs:(%rdi), %rax
This fixes rdar://8127102. I have several cleanup patches coming
next.
llvm-svn: 107917
returns the start of the memory operand for an instruction.
Introduce a new "X86AddrSegment" enum to reduce # magic numbers
referring to X86 memory operand layout.
llvm-svn: 107916
.weak_def_can_be_hidden directive. Chris pointed out that the MCAsmInfo.h/.cpp
chunks aren't needed for this until the compiler starts generating these. And
when that happens it will be more convenient for it to be a bool than a const
char*.
llvm-svn: 107906
This pass runs before COPY instructions are passed to copyPhysReg, so we simply
translate COPY to the proper pseudo instruction. Note that copyPhysReg does not
handle floating point stack copies.
Once COPY is used everywhere, this can be cleaned up a bit, and most of the
pseudo instructions can be removed.
llvm-svn: 107899
the simplification of frame index register scavenging to not have to check
for available registers directly and instead just let scavengeRegister()
handle it.
llvm-svn: 107880
EXTRACT_SUBREG no longer appears as a machine instruction. Use COPY instead.
Add isCopy() checks in many places using isMoveInstr() and isExtractSubreg().
The isMoveInstr hook will be removed later.
llvm-svn: 107879
(X >s -1) ? C1 : C2 and (X <s 0) ? C2 : C1
into ((X >>s 31) & (C2 - C1)) + C1, avoiding the conditional.
This optimization could be extended to take non-const C1 and C2 but we better
stay conservative to avoid code size bloat for now.
for
int sel(int n) {
return n >= 0 ? 60 : 100;
}
we now generate
sarl $31, %edi
andl $40, %edi
leal 60(%rdi), %eax
instead of
testl %edi, %edi
movl $60, %ecx
movl $100, %eax
cmovnsl %ecx, %eax
llvm-svn: 107866
This target hook is intended to replace copyRegToReg entirely, but for now it
calls copyRegToReg.
Any remaining calls to copyRegToReg wil be replaced by COPY instructions.
llvm-svn: 107854
1. The arguments are f32.
2. The arguments are loads and they have no uses other than the comparison.
3. The comparison code is EQ or NE.
e.g.
vldr.32 s0, [r1]
vldr.32 s1, [r0]
vcmpe.f32 s1, s0
vmrs apsr_nzcv, fpscr
beq LBB0_2
=>
ldr r1, [r1]
ldr r0, [r0]
cmp r0, r1
beq LBB0_2
More complicated cases will be implemented in subsequent patches.
llvm-svn: 107852
Add explicit testcases for tail calls within the same module.
Duplicate some code to humor those who think .w doesn't apply on ARM.
Leave this disabled on Thumb1, and add some comments explaining why it's hard
and won't gain much.
llvm-svn: 107851
(if there are any) and use the one which remains available for the longest
rather than just using the first one. This should help enable better re-use
of the loaded frame index values. rdar://7318760
llvm-svn: 107847
address calculation instructions leading up to a jump table when we're trying
to convert them into a TB[H] instruction in Thumb2. This realistically
shouldn't happen much, if at all, for well formed inputs, but it's more correct
to handle it. rdar://7387682
llvm-svn: 107830
around everywhere, and also give it an InsertPt member, to enable isel
to operate at an arbitrary position within a block, rather than just
appending to a block.
llvm-svn: 107791
interface needs implementations to be consistent, so any code which
wants to support different semantics must use a different interface.
It's not currently worthwhile to add a new interface for this new
concept.
Document that AliasAnalysis doesn't support cross-function queries.
llvm-svn: 107776
builds to "Release". The default build is unchanged (optimization on,
assertions on), however it is now called Release+Asserts. The intent
is that future LLVM releases released via llvm.org will be Release builds
in the new sense, i.e. will have assertions disabled (currently they have
assertions enabled, for a more than 20% slowdown). This will bring them
in line with MacOS releases, which ship with assertions disabled. It also
means that "Release" now means the same things in make and cmake builds:
cmake already disables assertions for "Release" builds AFAICS.
llvm-svn: 107758
INSERT_SUBREG will now only appear in SSA machine instructions.
Fix the handling of partial redefs in ProcessImplicitDefs. This is now relevant
since partial redef COPY instructions appear.
llvm-svn: 107726
It is OK for an alias live range to overlap if there is a copy to or from the
physical register. CoalescerPair can work out if the copy is coalescable
independently of the alias.
This means that we can join with the actual destination interval instead of
using the getOrigDstReg() hack. It is no longer necessary to merge clobber
ranges into subregisters.
llvm-svn: 107695
This way *only* debug sections can be discarded, but not the opposite. Seems like the copy-and-pasto from ELF code, since there it contains the reverse flag ('alloc').
llvm-svn: 107658
the example in the testcase, we now generate:
_test1: ## @test1
movss 4(%esp), %xmm0
addss 8(%esp), %xmm0
movl 12(%esp), %eax
movss %xmm0, (%eax)
ret
instead of:
_test1: ## @test1
subl $20, %esp
movl 24(%esp), %eax
movq %mm0, (%esp)
movq %mm0, 8(%esp)
movss (%esp), %xmm0
addss 12(%esp), %xmm0
movss %xmm0, (%eax)
addl $20, %esp
ret
v2f32 support did not work reliably because most of the X86
backend didn't know it was legal. It was apparently only added
to support returning source-level v2f32 values in MMX registers
in x86-32 mode. If ABI compatibility is important on this
GCC-extended-vector type for some reason, then the frontend
should generate IR that returns v2i32 instead of v2f32. However,
we generally don't try very hard to be abi compatible on gcc
extended vectors.
llvm-svn: 107601
v2f32 as legal in 32-bit mode. It is just as terrible there,
but I just care about x86-64 and noone claims it is valuable
in 64-bit mode.
llvm-svn: 107600
The COPY instruction is intended to replace the target specific copy
instructions for virtual registers as well as the EXTRACT_SUBREG and
INSERT_SUBREG instructions in MachineFunctions. It won't we used in a selection
DAG.
COPY is lowered to native register copies by LowerSubregs.
llvm-svn: 107529
new basic blocks, and if used as a function argument, that can cause call frame
setup / destroy pairs to be split across a basic block boundary. That prevents
us from doing a simple assertion to check that the pairs match and alloc/
dealloc the same amount of space. Modify the assertion to only check the
amount allocated when there are matching pairs in the same basic block.
rdar://8022442
llvm-svn: 107517
- X86 unfolding should check if the instructions being unfolded has memoperands.
If there is no memoperands, then it must assume conservative alignment. If this
would introduce an expensive sse unaligned load / store, then unfoldMemoryOperand
etc. should not unfold the instruction.
llvm-svn: 107509
PrologEpilog code, and use it to determine whether
the asm forces stack alignment or not. gcc consistently
does not do this for GCC-style asms; Apple gcc inconsistently
sometimes does it for asm blocks. There is no
convenient place to put a bit in either the SDNode or
the MachineInstr form, so I've added an extra operand
to each; unlovely, but it does allow for expansion for
more bits, should we need it. PR 5125. Some
existing testcases are affected.
The operand lists of the SDNode and MachineInstr forms
are indexed with awesome mnemonics, like "2"; I may
fix this someday, but not now. I'm not making it any
worse. If anyone is inspired I think you can find all
the right places from this patch.
llvm-svn: 107506
have any effect, and second, deleting stores can potentially invalidate
an AliasAnalysis, and there's currently no notification for this.
llvm-svn: 107496
This allows us to recognize the common case where all uses could be
rematerialized, and no stack slot allocation is necessary.
If some values could be fully rematerialized, remove them from the live range
before allocating a stack slot for the rest.
llvm-svn: 107492
getFunctionAlignment and the corresponding use of that value in the ARM
asm printer, but now we're using the standard asm printer. The result of
this was that function alignments were dropped completely for Thumb functions.
Radar 8143571.
llvm-svn: 107435
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.
For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
Currently only supported on Darwin platforms.
llvm-svn: 107433
such a way that debug info for symbols preserved even if symbols are
optimized away by the optimizer.
Add new special pass to remove debug info for such symbols.
llvm-svn: 107416
- Add encode bits for VEX_W
- All 128-bit SSE 1 & SSE2 instructions that are described
in the .td file now have a AVX encoded form already working.
llvm-svn: 107365
entries associated with the value being erased in the
folding set map. These entries used to be harmless, because
a SCEVUnknown doesn't store any information about its Value*,
so having a new Value allocated at the old Value's address
wasn't a problem. But now that ScalarEvolution is storing more
information about values, this is no longer safe.
llvm-svn: 107316
LocalRewriter::runOnMachineFunction uses this information to mark dead spill
slots.
This means that InlineSpiller now also works for functions that spill.
llvm-svn: 107302
replaced by a bigger array in SmallPtrSet (by overridding it), instead just use a
pointer to the start of the storage, and have SmallPtrSet pass in the value to use.
This has the disadvantage that SmallPtrSet becomes bigger by one pointer. It has
the advantage that it no longer uses tricky C++ rules, and is clearly correct while
I'm not sure the previous version was. This was inspired by g++-4.6 pointing out
that SmallPtrSetImpl was writing off the end of SmallArray, which it was. Since
SmallArray is replaced with a bigger array in SmallPtrSet, the write was still to
valid memory. But it was writing off the end of the declared array type - sounds
kind of dubious to me, like it sounded dubious to g++-4.6. Maybe g++-4.6 is wrong
and this construct is perfectly valid and correctly compiled by all compilers, but
I think it is better to avoid the whole can of worms by avoiding this construct.
llvm-svn: 107285
on ScalarEvolution successfully folding and preserving
range information for both A-B and B-A. Now, if it gets
either one, it's sufficient.
llvm-svn: 107249
InlineSpiller inserts loads and spills immediately instead of deferring to
VirtRegMap. This is possible now because SlotIndexes allows instructions to be
inserted and renumbered.
This is work in progress, and is mostly a copy of TrivialSpiller so far. It
works very well for functions that don't require spilling.
llvm-svn: 107227
metadata types which should be marked as "weak", but which the linker will
remove upon final linkage. For example, the "objc_msgSend_fixup_alloc" symbol is
defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
llvm-svn: 107205
A partial redefine needs to be treated like a tied operand, and the register
must be reloaded while processing use operands.
This fixes a bug where partially redefined registers were processed as normal
defs with a reload added. The reload could clobber another use operand if it was
a kill that allowed register reuse.
llvm-svn: 107193
The LowerSubregs pass needs to preserve implicit def operands attached to
EXTRACT_SUBREG instructions when it replaces those instructions with copies.
llvm-svn: 107189
is stripped off. Currently set unconditionally, since the API
does not provide a way of working out if anything was actually
stripped off.
llvm-svn: 107142
of getPhysicalRegisterRegClass with it.
If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.
llvm-svn: 107140
in terms of Op<> and ArgOffset. This works for
values of {0, 1} for ArgOffset.
Please note that ArgOffset will become 0 soon and
will go away eventually.
llvm-svn: 107129
instruction to an add scev, it's not safe to blindly transfer the
inbounds flag from a gep instruction to an nsw on the scev for the
gep.
llvm-svn: 107117
There are 2 changes relative to the previous version of the patch:
1) For the "simple" if-conversion case, there's no need to worry about
RemoveExtraEdges not handling an unanalyzable branch. Predicated terminators
are ignored in this context, so RemoveExtraEdges does the right thing.
This might break someday if we ever treat indirect branches (BRIND) as
predicable, but for now, I just removed this part of the patch, because
in the case where we do not add an unconditional branch, we rely on keeping
the fall-through edge to CvtBBI (which is empty after this transformation).
The change relative to the previous patch is:
@@ -1036,10 +1036,6 @@
IterIfcvt = false;
}
- // RemoveExtraEdges won't work if the block has an unanalyzable branch,
- // which is typically the case for IfConvertSimple, so explicitly remove
- // CvtBBI as a successor.
- BBI.BB->removeSuccessor(CvtBBI->BB);
RemoveExtraEdges(BBI);
// Update block info. BB can be iteratively if-converted.
2) My patch exposed a bug in the code for merging the tail of a "diamond",
which had previously never been exercised. The code was simply checking that
the tail had a single predecessor, but there was a case in
MultiSource/Benchmarks/VersaBench/dbms where that single predecessor was
neither edge of the diamond. I added the following change to check for
that:
@@ -1276,7 +1276,18 @@
// tail, add a unconditional branch to it.
if (TailBB) {
BBInfo TailBBI = BBAnalysis[TailBB->getNumber()];
- if (TailBB->pred_size() == 1 && !TailBBI.HasFallThrough) {
+ bool CanMergeTail = !TailBBI.HasFallThrough;
+ // There may still be a fall-through edge from BBI1 or BBI2 to TailBB;
+ // check if there are any other predecessors besides those.
+ unsigned NumPreds = TailBB->pred_size();
+ if (NumPreds > 1)
+ CanMergeTail = false;
+ else if (NumPreds == 1 && CanMergeTail) {
+ MachineBasicBlock::pred_iterator PI = TailBB->pred_begin();
+ if (*PI != BBI1->BB && *PI != BBI2->BB)
+ CanMergeTail = false;
+ }
+ if (CanMergeTail) {
MergeBlocks(BBI, TailBBI);
TailBBI.IsDone = true;
} else {
With these fixes, I was able to run all the SingleSource and MultiSource
tests successfully.
llvm-svn: 107110
have to be registers, per gcc documentation. This affects
the logic for determining what "g" should lower to. PR 7393.
A couple of existing testcases are affected.
llvm-svn: 107079
When an instruction has tied operands and physreg defines, we must take extra
care that the tied operands conflict with neither physreg defs nor uses.
The special treatment is given to inline asm and instructions with tied operands
/ early clobbers and physreg defines.
This fixes PR7509.
llvm-svn: 107043
large integers, the first inserted value would always create
an 'or X, 0'. Even though this is trivially zapped by
instcombine, don't bother creating this pointless instruction.
llvm-svn: 106979
the returned value after the tail call if it differs from other return
values. The optimal thing to do would be to introduce a phi node for
the return value, but for the moment just fix the miscompile.
llvm-svn: 106947
if-conversion. The RemoveExtraEdges function doesn't work for blocks that
end with unanalyzable branches, so in those cases, the "extra" edges must
be explicitly removed. The CopyAndPredicateBlock and MergeBlocks methods
can also avoid copying successor edges due to branches that have already
been removed. The latter case is especially helpful when MergeBlocks is
called for handling "diamond" if-conversions, where otherwise you can end
up with some weird intermediate states in the CFG. Unfortunately I've
been unable to find cases where this cleanup actually makes a significant
difference in the code. There is one test where we manage to remove an
empty block at the end of a function. Radar 6911268.
llvm-svn: 106939
CopyFromReg nodes for aliasing registers (AX and AL). This confuses the fast
register allocator.
Instead of CopyFromReg(AL), use ExtractSubReg(CopyFromReg(AX), sub_8bit).
This fixes PR7312.
llvm-svn: 106934
introduced in r106343, but only showed up recently (with a particular compiler &
linker combination) because of the particular check, and because we have no
builtin checking for dereferencing the end of an array, which is truly
unfortunate.
llvm-svn: 106908
The VNInfo.kills vector was almost unused except for all the code keeping it
updated. The few places using it were easily rewritten to check for interval
ends instead.
The two new methods LiveInterval::killedAt and killedInRange are replacements.
This brings us down to 3 independent data structures tracking kills.
llvm-svn: 106905
SCEVUnknown values which are loop-variant, as LSR can't do anything
interesting with these values in any case. This fixes very slow compile
times on loops which have large numbers of such values.
llvm-svn: 106897
for an "i" constraint should get lowered; PR 6309. While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.
llvm-svn: 106893
are dead, not just the def of this register. I.e., a register could be dead, but
it's subreg isn't.
Testcase to follow with a subsequent patch.
llvm-svn: 106878
with the following instructions. This is done via trickery by considering the
instruction preceding the IT to be the hazard. Care must be taken to ensure
it's the first non-debug instruction, or the presence of debug info will
affect codegen.
Part of the continuing work for rdar://7797940, making ARM code-gen unaffected
by the presence of debug information.
llvm-svn: 106871
buffer in the same chunk of memory.
2 less mallocs for every uninitialized MemoryBuffer and 1 less malloc for every
MemoryBuffer pointing to a memory range translate into 20% less mallocs on
clang -cc1 -Eonly Cocoa_h.m.
llvm-svn: 106839
and CallInst for getting hold
of the intrinsic's arguments
simplify along the way (at least for me this is much more legible now)
Bill, Baldrick or Anton, please review\!
llvm-svn: 106838
This fixes PR7479 and PR7485. The test cases from those PRs are big, so not
included. However, PR7485 comes from self hosting on FreeBSD, so we will surely
hear about any regression.
llvm-svn: 106811
address requires a register or secondary load to compute
(most PIC modes). This improves "g" constraint handling. 8015842.
The test from 2007 is attempting to test the fix for PR1761,
but since -relocation-model=static doesn't work on Darwin
x86-64, it was not testing what it was supposed to be testing
and was passing erroneously. Fixed to use Linux x86-64.
llvm-svn: 106779
which don't have a catch-all associated with them not just clean-ups. This fixes
the SingleSource/Benchmarks/Shootout-C++/except.cpp testcase that broke because
of my change r105902.
llvm-svn: 106772
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
This second attempt fixes some crashes that only occurred Linux.
llvm-svn: 106769
[L]oad, [u]se, [d]ef, or [S]tore slots.
This makes it easier to see if two indices refer to the same instruction,
avoiding mental mod 4 calculations.
llvm-svn: 106766
In this case it is essential that the kill is real because the spiller will
decide to omit a spill if it thinks there is a later kill.
llvm-svn: 106751
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.
I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.
llvm-svn: 106748
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
llvm-svn: 106701
void t(int *cp0, int *cp1, int *dp, int fmd) {
int c0, c1, d0, d1, d2, d3;
c0 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
c1 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
/* ... */
}
It code gens into something pretty bad. But with this change (analogous to the
X86 back-end), it will use ldm and generate few instructions.
llvm-svn: 106693
branch turns out to be ARM-to-Thumb or vice versa
the linker cannot resolve this. 8120438.
If this optimization is going to be useful we probably
need a compiler flag "assume callees are same architecture"
or something like that.
llvm-svn: 106662
atomic intrinsics, either because the use locking instructions for the
atomics, or because they perform the locking directly. Add support in the
DAG combiner to fold away the fences.
llvm-svn: 106630
Failure to seed metdata in such cases causes troubles when in a cloned module, metadata from a new module refers to values in old module. Usually this results in mysterious bugpoint crashes. For example,
Checking to see if we can delete global inits: Unknown constant!
UNREACHABLE executed at /d/g/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:904!
llvm-svn: 106592
instructions.
This does not affect codegen much because SUBREG_TO_REG is only used by X86 and
X86 does not use the register scavenger, but it prevents verifier errors.
llvm-svn: 106583
Measurements show that it does not speed up coalescing, so there is no reason
the keep the added complexity around.
Also clean out some unused methods and static functions.
llvm-svn: 106548
opportunities. For example, this lets it emit this:
movq (%rax), %rcx
addq %rdx, %rcx
instead of this:
movq %rdx, %rcx
addq (%rax), %rcx
in the case where %rdx has subsequent uses. It's the same number
of instructions, and usually the same encoding size on x86, but
it appears faster, and in general, it may allow better scheduling
for the load.
llvm-svn: 106493
This makes the output look consistent, as well as help some third party assemblers expecting the attributes to be in the second column."
Patch by Arnaud de Grandmaison!
llvm-svn: 106469
Split the code for materializing a value out of
SelectionDAGBuilder::getValue into a helper function, so that it can
be used in other ways. Add a new getNonRegisterValue function which
uses it, for use in code which doesn't want a CopyFromReg even
when FuncMap.ValueMap already has an entry for it.
llvm-svn: 106422
This allows the fast regiser allocator to remove redundant
register moves.
Update a set of tests that depend on the register allocator
to be linear scan.
llvm-svn: 106420
use sharing map. The reconcileNewOffset logic already forces a
separate use if the kinds differ, so incorporating the kind in the
key means we can track more sharing opportunities.
More sharing means fewer total uses to track, which means smaller
problem sizes, which means the conservative throttles don't kick
in as often.
llvm-svn: 106396
assuming that loops are in canonical form, as ScalarEvolution doesn't
depend on LoopSimplify itself. Also, with indirectbr not all loops can
be simplified. This fixes PR7416.
llvm-svn: 106389
and operand renaming to help.
The giant turn the constraints on and selectively turn it off
should probably be inverted at some point since it's just largely
50/50.
llvm-svn: 106367
- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344
instructions, but it doesn't really understand live ranges, so the first
INSERT_SUBREG uses an implicitly defined register.
Fix it in LiveVariableAnalysis by adding the <undef> flag.
llvm-svn: 106333
entries used by llvm-gcc. *_[U]MIN and such can be added later if needed.
This enables the front ends to simplify handling of the atomic intrinsics by
removing the target-specific decision about which targets can handle the
intrinsics.
llvm-svn: 106321
original basic block. This avoids trouble with examining
instructions in other basic blocks which haven't been
assigned registers yet.
llvm-svn: 106310
basic tests.
This has been well tested on Darwin but not elsewhere.
It should work provided the linker correctly resolves
B.W <label in other function>
which it has not seen before, at least from llvm-based
compilers. I'm leaving the arm-tail-calls switch in
until I see if there's any problems because of that;
it might need to be disabled for some environments.
llvm-svn: 106299
so when IfConverter::CopyAndPredicateBlock checks to see if it should ignore
an instruction because it is a branch, it should not check if the branch is
predicated.
This case (when IgnoreBr is true) is only relevant from IfConvertTriangle,
where new branches are inserted after the block has been copied and predicated.
If the original branch is not removed, we end up with multiple conditional
branches (possibly conflicting) at the end of the block. Aside from any
immediate errors resulting from that, this confuses the AnalyzeBranch functions
so that the branches are not analyzable. That in turn causes the IfConverter to
think that the "Simple" pattern can be applied, and things go downhill fast
because the "Simple" pattern does _not_ apply if the block can fall through.
This is pretty fragile. If there are other degenerate cases where AnalyzeBranch
fails, but where the block may still fall through, the IfConverter should not
perform its "Simple" if-conversion. But, I don't know how to do that with the
current AnalyzeBranch interface, so for now, the best thing seems to be to
avoid creating branches that AnalyzeBranch cannot handle.
Evan, please review!
llvm-svn: 106291
switch from this:
if (TimePassesIsEnabled) {
NamedRegionTimer T(Name, GroupName);
do_something();
} else {
do_something(); // duplicate the code, this time without a timer!
}
to this:
{
NamedRegionTimer T(Name, GroupName, TimePassesIsEnabled);
do_something();
}
llvm-svn: 106285
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
llvm-svn: 106243
for the moment. The implementation of the libcall will follow.
Currently, the llvm-gcc knows when the intrinsics can be correctly handled by
the back end and only generates them in those cases, issuing libcalls directly
otherwise. That's too much coupling. The intrinsics should always be
generated and the back end decide how to handle them, be it with a libcall,
inline code, or whatever. This patch is a step in that direction.
rdar://8097623
llvm-svn: 106227
LiveVariableAnalysis was a bit picky about a register only being redefined once,
but that really isn't necessary.
Here is an example of chained INSERT_SUBREGs that we can handle now:
68 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1028<kill>, 14
register: %reg1040 +[70,134:0)
76 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1029<kill>, 13
register: %reg1040 replace range with [70,78:1) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,134:0) 0@78-(134) 1@70-(78)
84 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1030<kill>, 12
register: %reg1040 replace range with [78,86:2) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,134:0) 0@86-(134) 1@70-(78) 2@78-(86)
92 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1031<kill>, 11
register: %reg1040 replace range with [86,94:3) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,94:3)[94,134:0) 0@94-(134) 1@70-(78) 2@78-(86) 3@86-(94)
rdar://problem/8096390
llvm-svn: 106152
will conflict with another live range. The place which creates this scenerio is
the code in X86 that lowers a select instruction by splitting the MBBs. This
eliminates the need to check from the bottom up in an MBB for live pregs.
llvm-svn: 106066
call must not be callee-saved; following x86, add a new
regclass to represent this. Also fixes a couple of bugs.
Still disabled by default; Thumb doesn't work yet.
llvm-svn: 106053
flag argument to addReg is not the same format as flags attached
to MachineOperand, although both have the same info. I don't
think this actually mattered; the bootstrap failure did not
reproduce on the next run anyway.
llvm-svn: 106049
SimpleRegisterCoalescing::JoinIntervals() uses CoalescerPair to determine if a
copy is coalescable, and in very rare cases it can return true where LHS is not
live - the coalescable copy can come from an alias of the physreg in LHS.
llvm-svn: 106021
combined to an insert_subreg, i.e., where the destination register is larger
than the source. We need to check that the subregs can be composed for that
case in a symmetrical way to the case when the destination is smaller.
llvm-svn: 106004
Early clobbers defining a virtual register were first alocated to a physreg and
then processed as a physreg EC, spilling the virtreg.
This fixes PR7382.
llvm-svn: 105998
Given a copy instruction, CoalescerPair can determine which registers to
coalesce in order to eliminate the copy. It deals with all the subreg fun to
determine a tuple (DstReg, SrcReg, SubIdx) such that:
- SrcReg is a virtual register that will disappear after coalescing.
- DstReg is a virtual or physical register whose live range will be extended.
- SubIdx is 0 when DstReg is a physical register.
- SrcReg can be joined with DstReg:SubIdx.
CoalescerPair::isCoalescable() determines if another copy instruction is
compatible with the same tuple. This fixes some NEON miscompilations where
shuffles are getting coalesced as if they were copies.
The CoalescerPair class will replace a lot of the spaghetti logic in JoinCopy
later.
llvm-svn: 105997
replacing the overly conservative checks that I had introduced recently to
deal with correctness issues. This makes a pretty noticable difference
in our testcases where reg_sequences are used. I've updated one test to
check that we no longer emit the unnecessary subreg moves.
llvm-svn: 105991
containing the target address, an input, into an output. I don't
think this actually broke anything on x86 (it does on ARM), but
it's wrong.
llvm-svn: 105986
immediate" operands. These functions have so far only been used for VMOV
but they also apply to other NEON instructions with modified immediate
operands. No functional changes.
llvm-svn: 105969
symbols as declarations in the X86 backend. This would manifest
on darwin x86-32 as errors like this with -fvisibility=hidden:
symbol '__ZNSbIcED1Ev' can not be undefined in a subtraction expression
This fixes PR7353.
llvm-svn: 105954
clean-up to a catch-all after inlining, take into account that there could be
filter IDs as well. The presence of filters don't mean that the selector catches
anything. It's just metadata information.
llvm-svn: 105872
i64 and f64 types, but now it also handle Neon vector types, so the f64 result
of VMOVDRR may need to be converted to a Neon type. Radar 8084742.
llvm-svn: 105845
the machine instruction representation of the immediate value to be encoded
into an integer with similar fields as the actual VMOV instruction. This makes
things easier for the disassembler, since it can just stuff the bits into the
immediate operand, but harder for the asm printer since it has to decode the
value to be printed. Testcase for the encoding will follow later when MC has
more support for ARM.
llvm-svn: 105836
dbg_value immediately follows a sequence of ldr/str instructions that should
be combined into an ldm/stm and is the last instruction in the block, then
combine may end up being skipped.
llvm-svn: 105758
This is a bit of a hack to make inline asm look more like call instructions.
It would be better to produce correct dead flags during isel.
llvm-svn: 105749
%reg1025 = <sext> %reg1024
...
%reg1026 = SUBREG_TO_REG 0, %reg1024, 4
into this:
%reg1025 = <sext> %reg1024
...
%reg1027 = EXTRACT_SUBREG %reg1025, 4
%reg1026 = SUBREG_TO_REG 0, %reg1027, 4
The problem here is that SUBREG_TO_REG is there to assert that an implicit zext
occurs. It doesn't insert a zext instruction. If we allow the EXTRACT_SUBREG
here, it will give us the value after the <sext>, not the original value of
%reg1024 before <sext>.
llvm-svn: 105741
the same condition, it's important to make sure they are scheduled together
to avoid forming multiple IT blocks. I'm adding a pre-regalloc pass that forms
IT blocks early (by re-scheduling instructions and split basic blocks) to
attempt to fix this. This is not turned on by default since I am not sure this
is the right fix.
Another issue is llvm selects are modeled as two-address conditional moves.
This can be very bad when the copies before the conditional moves are not
coalesced away. Teach IT formation pass to move the copies above the IT block
(when legal) to avoid breaking the IT block.
llvm-svn: 105669
instruction. Added the 64-bit version "jrcxz" so it is recognized and also
added the checks for incorrect uses of "jcxz" in 64-bit mode and "jrcxz" in
32-bit mode. Still to do is to correctly handle the encoding of the
instruction adding the Address-size override prefix byte, 0x67, when the width
of the count register is not the same as the mode the machine is running in.
Which for example means the encoding of "jecxz" depends if you are assembling
as a 32-bit target or a 64-bit target.
llvm-svn: 105661
- change isShuffleMaskLegal to show that all shuffles with 32-bit and 64-bit
elements are legal
- the Neon shuffle instructions do not support 64-bit elements, but we were
not checking for that before lowering shuffles to use them
- remove some 64-bit element vduplane patterns that are no longer needed
llvm-svn: 105586
scrounging through SCEVUnknown contents and SCEVNAryExpr operands;
instead just do a simple deterministic comparison of the precomputed
hash data.
Also, since this is more precise, it eliminates the need for the slow
N^2 duplicate detection code.
llvm-svn: 105540
encapsulation to force the users of these classes to know about the internal
data structure of the Operands structure. It also can lead to errors, like in
the MSIL writer.
llvm-svn: 105539
In file included from X86InstrInfo.cpp:16:
X86GenInstrInfo.inc:2789: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2790: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2792: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2793: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2808: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2809: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2816: error: integer constant is too large for 'long' type
X86GenInstrInfo.inc:2817: error: integer constant is too large for 'long' type
llvm-svn: 105524
register allocation.
Process all of the clobber lists at the end of the function, marking the
registers as used in MachineRegisterInfo.
This is necessary in case the calls clobber callee-saved registers (sic).
llvm-svn: 105473
replace an OpA with a widened OpB, it is possible to get new uses of OpA due to CSE
when recursively updating nodes. Since OpA has been processed, the new uses are
not examined again. The patch checks if this occurred and it it did, updates the
new uses of OpA to use OpB.
llvm-svn: 105453
VECTOR_SHUFFLEs to REG_SEQUENCE instructions. The standard ISD::BUILD_VECTOR
node corresponds closely to REG_SEQUENCE but I couldn't use it here because
its operands do not get legalized. That is pretty awful, but I guess it
makes sense for other targets. Instead, I have added an ARM-specific version
of BUILD_VECTOR that will have its operands properly legalized.
This fixes the rest of Radar 7872877.
llvm-svn: 105439
Check that all the instructions are in the same basic block, that the
EXTRACT_SUBREGs write to the same subregs that are being extracted, and that
the source and destination registers are in the same regclass. Some of
these constraints can be relaxed with a bit more work. Jakob suggested
that the loop that checks for subregs when NewSubIdx != 0 should use the
"nodbg" iterator, so I made that change here, too.
llvm-svn: 105437
A temporary flag -arm-tail-calls defaults to off,
so there is no functional change by default.
Intrepid users may try this; simple cases work
but there are bugs.
llvm-svn: 105413
registers it defines then interfere with an existing preg live range.
For instance, if we had something like these machine instructions:
BB#0
... = imul ... EFLAGS<imp-def,dead>
test ..., EFLAGS<imp-def>
jcc BB#2 EFLAGS<imp-use>
BB#1
... ; fallthrough to BB#2
BB#2
... ; No code that defines EFLAGS
jcc ... EFLAGS<imp-use>
Machine sink will come along, see that imul implicitly defines EFLAGS, but
because it's "dead", it assumes that it can move imul into BB#2. But when it
does, imul's "dead" imp-def of EFLAGS is raised from the dead (a zombie) and
messes up the condition code for the jump (and pretty much anything else which
relies upon it being correct).
The solution is to know which pregs are live going into a basic block. However,
that information isn't calculated at this point. Nor does the LiveVariables pass
take into account non-allocatable physical registers. In lieu of this, we do a
*very* conservative pass through the basic block to determine if a preg is live
coming out of it.
llvm-svn: 105387
expansion is the same as that used by LegalizeDAG.
The resulting code sucks in terms of performance/codesize on x86-32 for a
64-bit operation; I haven't looked into whether different expansions might be
better in general.
llvm-svn: 105378
spills and reloads.
This means that a partial define of a register causes a reload so the other
parts of the register are preserved.
The reload can be prevented by adding an <imp-def> operand for the full
register. This is already done by the coalescer and live interval analysis where
relevant.
llvm-svn: 105369
register updates.
These operands tell the spiller that the other parts of the partially defined
register are don't-care, and a reload is not necessary.
llvm-svn: 105361
instruction defines subregisters.
Any existing subreg indices on the original instruction are preserved or
composed with the new subreg index.
Also substitute multiple operands mentioning the original register by using the
new MachineInstr::substituteRegister() function. This is necessary because there
will soon be <imp-def> operands added to non read-modify-write partial
definitions. This instruction:
%reg1234:foo = FLAP %reg1234<imp-def>
will reMaterialize(%reg3333, bar) like this:
%reg3333:bar-foo = FLAP %reg333:bar<imp-def>
Finally, replace the TargetRegisterInfo pointer argument with a reference to
indicate that it cannot be NULL.
llvm-svn: 105358
x86 backend currently doesn't know how to handle them.
This doesn't really fix anything because LegalizeTypes doesn't know how to
handle them either. We do get a better error message, though.
llvm-svn: 105305
that are too large. This causes the freebsd bootloader to be too
large apparently.
It's unclear if this should be an -Os or -Oz thing. Thoughts welcome.
llvm-svn: 105228
were overspecified when inheriting sub-subregisters, for instance:
R0Q:subreg_even32 = R0Q:subreg_32bit = R0Q:subreg_even:subreg_32bit.
This meant that composeSubRegIndices(subreg_even, subreg_32bit) was ambiguous.
llvm-svn: 105063
implementation that is correct for most targets. Tablegen will override where
needed.
Add MachineOperand::subst{Virt,Phys}Reg methods that correctly handle existing
subreg indices when sustituting registers.
llvm-svn: 104985
optimization level.
This only really affects llc for now because both the llvm-gcc and clang front
ends override the default register allocator. I intend to remove that code later.
llvm-svn: 104904
and change llvm::sys::RunInterruptHandlers to call that function directly
instead of calling SignalHandler, because the rest of SignalHandler
invokes side effects which aren't appropriate, including raising the
signal.
llvm-svn: 104896
the wrong level. Clients which need to leave the stream open but
which still require the bitcode bits to be on disk should call
flush themselves.
llvm-svn: 104885
should fall through to the 'H' case, but instead 'Q' was falling through to 'R'
so that it would do the wrong thing for a big-endian ARM target.
llvm-svn: 104883