alignment of globals with a specified alignment, we fix
common variables to obey their alignment. Add a comment
explaining why this behavior is important.
llvm-svn: 102365
form of DEBUG_VALUE, as it doesn't have reasonable default
behavior for unsupported targets. Add a new hook instead.
No functional change.
llvm-svn: 102320
FunctionLoweringInfo, as it isn't SelectionDAG-specific. This isn't
completely natural, as PHI node state is not per-function but rather
per-basic-block, however there's currently no other convenient
per-basic-block state to group it with.
llvm-svn: 102109
This actually makes everything slower, but the plan is to have isel add <kill>
flags the way it is already adding <dead> flags. Then LiveVariables can be
removed again.
When ignoring the time spent in LiveVariables, -regalloc=fast is now twice as
fast as -regalloc=local.
llvm-svn: 102034
So far this is just a clone of -regalloc=local that has been lobotomized to run
25% faster. It drops the least-recently-used calculations, and is just plain
stupid when it runs out of registers.
The plan is to make this go even faster for -O0 by taking advantage of the short
live intervals in unoptimized code. It should not be necessary to calculate
liveness when most virtual registers are killed 2-3 instructions after they are
born.
llvm-svn: 102006
user-defined operations that use MMX register types, but
the compiler shouldn't generate them on its own. This adds
a Synthesizable abstraction to represent this, and changes
the vector widening computation so it won't produce MMX types.
(The motivation is to remove noise from the ABI compatibility
part of the gcc test suite, which has some breakage right now.)
llvm-svn: 101951
register is not killed in the loop.
This fixes 188.ammp on ARM where the post-ra scheduler would grab a register
that looked available but wasn't.
A testcase would be huge and fragile, sorry.
llvm-svn: 101930
into SelectionDAGBuilder. This avoids a separate pass over the
instructions, and has the side effect of providing debug location
information to the copy.
llvm-svn: 101906
in other types. fix this by only bumping zero-byte globals
up to a single byte if the *entire global* is zero size,
fixing PR6340.
This also fixes empty arrays etc to be handled correctly,
and only does this on subsection-via-symbols targets (aka
darwin) which is the only place where this matters.
llvm-svn: 101879
const_casts, and it reinforces the design of the Target classes being
immutable.
SelectionDAGISel::IsLegalToFold is now a static member function, because
PIC16 uses it in an unconventional way. There is more room for API
cleanup here.
And PIC16's AsmPrinter no longer uses TargetLowering.
llvm-svn: 101635
the live-in sets of BBs in the loop. Otherwise later pass may end up using the
registers and override the invariant. rdar://7852937
No reasonablly sized test case possible.
llvm-svn: 101626
with a fix for self-hosting
rotate CallInst operands, i.e. move callee to the back
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101465
JIT doesn't use the MC back-end asm printer to emit labels that it uses, the
section for the MCSymbol is never set. And thus the MCSymbol for the EH label
isn't marked as "defined". Because of that, TidyLandingPads removes the needed
landing pads from the JIT output. This breaks EH for every JIT program.
This is a work-around for this limitation. We pass in the label locations
map. If the label has a non-zero value, then it was "emitted" by the JIT and
TidyLandingPads shouldn't remove that label.
A nicer solution would be to mark the MCSymbol as "used" by the JIT and not rely
upon the section being set to determine if it's defined or not.
llvm-svn: 101453
MachineLoopInfo is already available when MachineSinking runs, so the check is
free.
There is no test case because it would require a critical edge into a loop, and
CodeGenPrepare splits those. This check is just to be extra careful.
llvm-svn: 101420
with a fix
rotate CallInst operands, i.e. move callee to the back
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101397
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101364
This doesn't occur much at all, it only seems to formed in the case
when the trunc optimization kicks in due to phase ordering. In that
case it is saves a few bytes on x86-32.
llvm-svn: 101350
a load/or/and/store sequence into a narrower store when it is
safe. Daniel tells me that clang will start producing this sort
of thing with bitfields, and this does trigger a few dozen times
on 176.gcc produced by llvm-gcc even now.
This compiles code like CodeGen/X86/2009-05-28-DAGCombineCrash.ll
into:
movl %eax, 36(%rdi)
instead of:
movl $4294967295, %eax ## imm = 0xFFFFFFFF
andq 32(%rdi), %rax
shlq $32, %rcx
addq %rax, %rcx
movq %rcx, 32(%rdi)
and each of the testcases into a single store. Each of them used
to compile into craziness like this:
_test4:
movl $65535, %eax ## imm = 0xFFFF
andl (%rdi), %eax
shll $16, %esi
addl %eax, %esi
movl %esi, (%rdi)
ret
llvm-svn: 101343
Sometimes it is desirable to sink instructions along a critical edge:
x = ...
if (a && b) ...
else use(x);
The 'a && b' condition creates a critical edge to the else block, but we still
want to sink the computation of x into the block. The else block is dominated by
the parent block, so we are not pushing instructions into new code paths.
llvm-svn: 101165
MachineBasicBlock::livein_iterator a const_iterator, because
clients shouldn't ever be using the iterator interface to
mutate the livein set.
llvm-svn: 101147
so the user at least knows what inline asm is a problem. For example:
error: inline asm not supported yet: don't know how to handle tied indirect register inputs
pr8788-1.c:14:10: note: generated from here
asm ("\n" : "+r" (stack->regs)
^
Instead of:
fatal error: error in backend: Don't know how to handle tied indirect register inputs yet!
llvm-svn: 100731
and use it in one place in inline asm handling stuff. Before
we'd generate this for an invalid modifier letter:
$ clang asm.c -c -o t.o
fatal error: error in backend: Invalid operand found in inline asm: 'abc incl ${0:Z}'
INLINEASM <es:abc incl ${0:Z}>, 10, %EAX<def>, 2147483657, %EAX, 14, %EFLAGS<earlyclobber,def,dead>, <!-1>
Now we generate this:
$ clang asm.c -c -o t.o
error: invalid operand in inline asm: 'incl ${0:Z}'
asm.c:3:12: note: generated from here
__asm__ ("incl %Z0" : "+r" (X));
^
1 error generated.
This is much better but still admittedly not great ("why" is the operand
invalid??), codegen should try harder with its diagnostics :)
llvm-svn: 100723
1. Introduce some enums and accessors in the InlineAsm class
that eliminate a ton of magic numbers when handling inline
asm SDNode.
2. Add a new MDNodeSDNode selection dag node type that holds
a MDNode (shocking!)
3. Add a new argument to ISD::INLINEASM nodes that hold !srcloc
metadata, propagating it to the instruction emitter, which
drops it.
No functionality change.
llvm-svn: 100605
A certain GDB testsuite case (local.cc) has a function nested inside a
class nested inside another function. GCC presents the innermost
function to llvm-convert first. Heretofore, the debug info mistakenly
placed the inner function at module scope. This patch walks the GCC
context links and instantiates the outer class and function so the
debug info is properly nested. Radar 7426545.
llvm-svn: 100530
"Print" methods to "Emit". Emit is something that goes
to an mc streamer, Print is something that goes to a
raw_ostream (for inline asm)
llvm-svn: 100337
"asm printering" happens through MCStreamer. This also
Streamerizes PIC16 debug info, which escaped my attention.
This removes a leak from LLVMTargetMachine of the 'legacy'
output stream.
llvm-svn: 100327
raw_ostream to print an instruction to had to be specified
at MCInstPrinter construction time instead of being able
to pick at each call to printInstruction.
llvm-svn: 100307
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100304
EmitInlineAsm. However, this attempt is foiled by operands
being emitted directly to "O" so I'll have to do some surgery
and finish MCizing the world.
llvm-svn: 100291
source addition. Apparently the buildbots were wrong about failures.
---
Add some switches helpful for debugging:
-print-before=<Pass Name>
Dump IR before running pass <Pass Name>.
-print-before-all
Dump IR before running each pass.
-print-after-all
Dump IR after running each pass.
These are helpful when tracking down a miscompilation. It is easy to
get IR dumps and do diffs on them, etc.
To make this work well, add a new getPrinterPass API to Pass so that
each kind of pass (ModulePass, FunctionPass, etc.) can create a Pass
suitable for dumping out the kind of object the Pass works on.
llvm-svn: 100249
representation. This eliminates the 'DILocation' MDNodes for
file/line/col tuples from -O0 -g codegen.
This remove the old DebugLoc class, making it a typedef for DebugLoc,
I'll rename NewDebugLoc next.
I didn't update the JIT to use the new apis, so it will continue to
work, but be as slow as before. Someone should eventually do this
or, better yet, rip out the JIT debug info stuff and build the JIT
on top of MC.
llvm-svn: 100209
<string> include. For some reason the buildbot choked on this while my
builds did not. It's probably due to a difference in system headers.
---
Add some switches helpful for debugging:
-print-before=<Pass Name>
Dump IR before running pass <Pass Name>.
-print-before-all
Dump IR before running each pass.
-print-after-all
Dump IR after running each pass.
These are helpful when tracking down a miscompilation. It is easy to
get IR dumps and do diffs on them, etc.
To make this work well, add a new getPrinterPass API to Pass so that
each kind of pass (ModulePass, FunctionPass, etc.) can create a Pass
suitable for dumping out the kind of object the Pass works on.
llvm-svn: 100204
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100191
-print-before=<Pass Name>
Dump IR before running pass <Pass Name>.
-print-before-all
Dump IR before running each pass.
-print-after-all
Dump IR after running each pass.
These are helpful when tracking down a miscompilation. It is easy to
get IR dumps and do diffs on them, etc.
To make this work well, add a new getPrinterPass API to Pass so that
each kind of pass (ModulePass, FunctionPass, etc.) can create a Pass
suitable for dumping out the kind of object the Pass works on.
llvm-svn: 100143
1. Makes it possible to lower with floating point loads and stores.
2. Avoid unaligned loads / stores unless it's fast.
3. Fix some memcpy lowering logic bug related to when to optimize a
load from constant string into a constant.
4. Adjust x86 memcpy lowering threshold to make it more sane.
5. Fix x86 target hook so it uses vector and floating point memory
ops more effectively.
rdar://7774704
llvm-svn: 100090
* Set the "DestA" and "DestB" according to how they're understood by the
method. I.e., if one or both of them should point to the "fall through" block,
then point to the fall through block.
* Improve the loop that removes superfluous edges to be more understandable.
llvm-svn: 100056
POD-like anyway, so we don't even care about calling their d'tors (DIEBlock
being the exception).
~6% less mallocs and ~1% compile time improvement on clang -O0 -g oggenc.c
llvm-svn: 100035
instructions. In addition to being a convenience,
they are faster than the old apis, particularly when
not going from an MDKindID like people should be
doing.
llvm-svn: 99982
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
A update of langref will occur in a subsequent checkin.
llvm-svn: 99928
create symbols. It is extremely error prone and a source of a lot
of the remaining integrated assembler bugs on x86-64.
This fixes rdar://7807601.
llvm-svn: 99902
on all objects it has allocated, if they are all of the same size and alignment.
Use this to destruct all VNInfos allocated in LiveIntervalAnalysis (PR6653).
valnos is not reliable for this purpose, as seen in r99400
(which still leaked, and sometimes caused double frees).
llvm-svn: 99881
transform. I.e., if a clean-up eh.selector call dominates the invoke of an
_Unwind_Resume_or_Rethrow, then we convert the eh.selector into a
catch-all. This patch, however, uses the DominatorTree information, and doesn't
go through the whole rigmarole of starting at the eh.exception call, finding the
corresponding URoR and eh.selector calls, and trying to trace through any number
of instruction types to get to them.
llvm-svn: 99846
and those derived from them. These are obnoxious because
they were written as: PatLeaf<(bitconvert). Not having an
argument was foiling adding better type checking for operand
count matching up with what was required (in this case,
bitconvert always requires an operand!)
llvm-svn: 99759
'invoke' instruction. You will get a situation like this:
bb:
%ehptr = eh.exception()
%sel = eh.selector(%ehptr, @per, 0);
...
bb2:
invoke _Unwind_Resume_or_Rethrow(%ehptr) %normal unwind to %lpad
lpad:
...
The unwinder will see the %sel call as a clean-up and, if it doesn't have a
catch further up the call stack, it will skip running it. But there *is* another
catch up the stack -- the catch for the %lpad. However, we can't see that. This
is fixed in code-gen, where we detect this situation, and convert the "clean-up"
selector call into a "catch-all" selector call. This gives us the correct
semantics.
llvm-svn: 99671
the custom insertion hook deletes the instruction, then we try to set dead
flags on it. Neither the code that I added nor the code that was there
before was safe.
llvm-svn: 99538
bytes instead of one byte. This is important because
we're running up to too many opcodes to fit in a byte
and it is aggrevated by FIRST_TARGET_MEMORY_OPCODE
making the numbering sparse. This just bites the
bullet and bloats out the table. In practice, this
increases the size of the x86 isel table from 74.5K
to 76K. I think we'll cope :)
This fixes rdar://7791648
llvm-svn: 99494
happening.
Enhance scheduling to set the DEAD flag on implicit defs
more aggressively. Before, we'd set an implicit def operand
to dead if it were present in the SDNode corresponding to
the machineinstr but had no use. Now we do it in this case
AND if the implicit def does not exist in the SDNode at all.
This exposes a couple of problems: one is the FIXME, which
causes a live intervals crash on CodeGen/X86/sibcall.ll.
The second is that it makes machinecse and licm more
aggressive (which is a good thing) but also exposes a case
where licm hoists a set0 and then it doesn't get resunk.
Talking to codegen folks about both these issues, but I need
this patch in in the meantime.
llvm-svn: 99485
Here is a theoretical example that illustrates why the placement is important.
tmp1 =
store tmp1 -> x
...
tmp2 = add ...
...
call
...
store tmp2 -> x
Now mem2reg comes along:
tmp1 =
dbg_value (tmp1 -> x)
...
tmp2 = add ...
...
call
...
dbg_value (tmp2 -> x)
When the debugger examine the value of x after the add instruction but before the call, it should have the value of tmp1.
Furthermore, for dbg_value's that reference constants, they should not be emitted at the beginning of the block (since they do not have "producers").
This patch also cleans up how SDISel manages DbgValue nodes. It allow a SDNode to be referenced by multiple SDDbgValue nodes. When a SDNode is deleted, it uses the information to find the SDDbgValues and invalidate them. They are not deleted until the corresponding SelectionDAG is destroyed.
llvm-svn: 99469
otherwise the SmallVector it contains doesn't free its memory.
In most cases LiveIntervalAnalysis could get away by not calling the destructor,
because VNInfos are bumpptr-allocated, and smallvectors usually don't grow.
However when the SmallVector does grow it always leaks.
This is the valgrind shown leak from the original testcase:
==8206== 18,304 bytes in 151 blocks are definitely lost in loss record 164 of 164
==8206== at 0x4A079C7: operator new(unsigned long) (vg_replace_malloc.c:220)
==8206== by 0x4DB7A7E: llvm::SmallVectorBase::grow_pod(unsigned long, unsigned long) (in /home/edwin/clam/git/builds/defaul
t/libclamav/.libs/libclamav.so.6.1.0)
==8206== by 0x4F90382: llvm::VNInfo::addKill(llvm::SlotIndex) (in /home/edwin/clam/git/builds/default/libclamav/.libs/libcl
amav.so.6.1.0)
==8206== by 0x5126B5C: llvm::LiveIntervals::handleVirtualRegisterDef(llvm::MachineBasicBlock*, llvm::ilist_iterator<llvm::M
achineInstr>, llvm::SlotIndex, llvm::MachineOperand&, unsigned int, llvm::LiveInterval&) (in /home/edwin/clam/git/builds/defau
lt/libclamav/.libs/libclamav.so.6.1.0)
==8206== by 0x512725E: llvm::LiveIntervals::handleRegisterDef(llvm::MachineBasicBlock*, llvm::ilist_iterator<llvm::MachineI
nstr>, llvm::SlotIndex, llvm::MachineOperand&, unsigned int) (in /home/edwin/clam/git/builds/default/libclamav/.libs/libclamav
.so.6.1.0)
==8206== by 0x51278A8: llvm::LiveIntervals::computeIntervals() (in /home/edwin/clam/git/builds/default/libclamav/.libs/libc
lamav.so.6.1.0)
==8206== by 0x5127CB4: llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&) (in /home/edwin/clam/git/builds/de
fault/libclamav/.libs/libclamav.so.6.1.0)
==8206== by 0x4DAE935: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /home/edwin/clam/git/builds/default/libclama
v/.libs/libclamav.so.6.1.0)
==8206== by 0x4DAEB10: llvm::FunctionPassManagerImpl::run(llvm::Function&) (in /home/edwin/clam/git/builds/default/libclama
v/.libs/libclamav.so.6.1.0)
==8206== by 0x4DAED3D: llvm::FunctionPassManager::run(llvm::Function&) (in /home/edwin/clam/git/builds/default/libclamav/.l
ibs/libclamav.so.6.1.0)
==8206== by 0x4D8BE8E: llvm::JIT::runJITOnFunctionUnlocked(llvm::Function*, llvm::MutexGuard const&) (in /home/edwin/clam/git/builds/default/libclamav/.libs/libclamav.so.6.1.0)
==8206== by 0x4D8CA72: llvm::JIT::getPointerToFunction(llvm::Function*) (in /home/edwin/clam/git/builds/default/libclamav/.libs/libclamav.so.6.1.0)
llvm-svn: 99400
disabled for several months (since svn r88806) and no one noticed. My fix
for pr6543 yesterday reenabled it, but broke the ARM port's code for using
TBB/TBH. Rather than adding a target hook to disable merging for Thumb2 only,
I'm just taking this out. It is not common to have identical jump tables,
the code we used to merge them was O(N^2), and it only helps code size, not
performance.
llvm-svn: 98977
always create a new jump table. The intention was to avoid merging jump
tables in SelectionDAGBuilder, and to wait for the branch folding pass to
merge tables. Unfortunately, the same getJumpTableIndex() method is also
used to merge tables in branch folding, so as a result of this change
branch tables are never merged. Worse, the branch folding code is expecting
getJumpTableIndex to always return the index of an existing table, but with
this change, it never does so. In at least some cases, e.g., pr6543, this
creates references to non-existent tables.
I've fixed the problem by adding a new createJumpTableIndex function, which
will always create a new table, and I've changed getJumpTableIndex to only
look at existing tables.
llvm-svn: 98845
Remove ugly hack that aborted the coalescer before using N^2 time.
This affects functions with very complicated live intervals for physical
registers, i.e. functions with thousands of function calls.
llvm-svn: 98776
to LLVM IR changes with addr label weirdness. In the testcase, we
generate references to the two bb's when codegen'ing the first
function:
_test1: ## @test1
leaq Ltmp0(%rip), %rax
..
leaq Ltmp1(%rip), %rax
Then continue to codegen the second function where the blocks
get merged. We're now smart enough to emit both labels, producing
this code:
_test_fun: ## @test_fun
## BB#0: ## %entry
Ltmp1: ## Block address taken
Ltmp0:
## BB#1: ## %ret
movl $-1, %eax
ret
Rejoice.
llvm-svn: 98595
label is generated, but then the block is deleted. Since the
value is undefined, we just emit the label right after the entry
label of the function. It might matter that the label is in the
same section as the function was afterall.
llvm-svn: 98579
function, then the BB is RAUW'd before the definition is emitted. There
are still two cases not being handled, but this should improve us back to
the situation before I touched anything.
llvm-svn: 98566
an MCSymbol. Make the EH_LABEL MachineInstr hold its label
with an MCSymbol instead of ID. Fix a bug in MMI.cpp which
would return labels named "Label4" instead of "label4".
llvm-svn: 98463
instead of label ID's. This cleans up and regularizes a bunch
of code and makes way for future progress.
Unfortunately, this pointed out to me that JITDwarfEmitter.cpp
is largely copy and paste from DwarfException/MachineModuleInfo
and other places. This is very sad and disturbing. :(
One major change here is that TidyLandingPads moved from being
called in DwarfException::BeginFunction to being called in
DwarfException::EndFunction. There should not be any
functionality change from doing this, but I'm not an EH expert.
llvm-svn: 98459
and passing off ownership to AsmPrinter. Now MachineModuleInfo
creates it and owns it by value. This allows us to use MCSymbols
more consistently throughout the rest of the code generator, and
simplifies a bit of code. This also allows MachineFunction to
keep an MCContext reference handy, and cleans up the TargetRegistry
interfaces for AsmPrinters.
llvm-svn: 98450
(it seems that FreeBSD doesn't have copysignl). Done by
removing a bunch of assumptions from the code. This may also
help with sparc 128 bit floats.
llvm-svn: 98346
where we used ot create an MCSymbol for ".". Now emit an assembler
temporary label and reference it instead of "." textually.
rdar://7739457
llvm-svn: 98292
cl = EXTRACT_SUBREG reg1024, 1, is overly conservative. It should check
for overlaps of vr's live interval with the super registers of the
physical register (ECX in this case) and let JoinIntervals() handle checking
the coalescing feasibility against the physical register (cl in this case).
llvm-svn: 98251
indicates that an MCSymbol is external or not. (It's true if it's external.)
This will be used to specify the correct information to add to non-lazy
pointers. That will be explained further when this bit is used.
llvm-svn: 98199
1. Be careful with cse "cheap" expressions. e.g. constant materialization. Only cse them when the common expression is local or in a direct predecessor. We don't want cse of cheap instruction causing other expressions to be spilled.
2. Watch out for the case where the expression doesn't itself uses a virtual register. e.g. lea of frame object. If the common expression itself is used by copies (common for passing addresses to function calls), don't perform the cse. Since these expressions do not use a register, it creates a live range but doesn't close any, we want to be very careful with increasing register pressure.
Note these are heuristics so machine cse doesn't make register allocator unhappy. Once we have proper live range splitting and re-materialization support in place, these should be evaluated again.
Now machine cse is almost always a win on llvm nightly tests on x86 and x86_64.
llvm-svn: 98121
need to be MCized, but the last debug info thing are LEB and
cygwin specific (which the MC api doesn't support yet) and
one specific form of EmitReference which I'll tackle next.
llvm-svn: 98116
method. With this, comments should end up on the same lines as the .byte
directives (for example) and we now get no output with:
$ llc CodeGen/X86/2009-02-12-DebugInfoVLA.ll -o - -filetype=null -asm-verbose
woot.
llvm-svn: 98105
is preparatory to having PEI's scavenged frame index value reuse logic
properly distinguish types of frame values (e.g., whether the value is
stack-pointer relative or frame-pointer relative).
No functionality change.
llvm-svn: 98086
coalescer) handle sub-register classes.
- Add heuristics to avoid non-profitable cse. Given the current lack of live
range splitting, avoid cse when an expression has PHI use and the would be
new use is in a BB where the expression wasn't already being used.
llvm-svn: 98043
physreg becomes ridiculously high.
std::upper_bound may be log(N), but for sufficiently large live intervals, it
becomes log(N)*cachemiss = a long long time.
This patch improves coalescer time by 4500x for a function with 20000
function calls. The generated code is different, but not significantly worse -
the allocator hints are almost as good as physreg coalescing anyway.
llvm-svn: 98023
available, the only thing this affects is that we produce
.set in one case we didn't before, which shouldn't harm
anything. Make EmitSectionOffset call EmitDifference
instead of duplicating it.
llvm-svn: 98005
is a workaround for <rdar://problem/7672401/> (which I filed).
This let's us build Wine on Darwin, and it gets the Qt build there a little bit
further (so Doug says).
llvm-svn: 97845
CALL ... %RAX<imp-def>
... [not using %RAX]
%EAX = ..., %RAX<imp-use, kill>
RET %EAX<imp-use,kill>
Now we do this:
CALL ... %RAX<imp-def, dead>
... [not using %RAX]
%EAX = ...
RET %EAX<imp-use,kill>
By not artificially keeping %RAX alive, we lower register pressure a bit.
The correct number of instructions for 2008-08-05-SpillerBug.ll is obviously
55, anybody can see that. Sheesh.
llvm-svn: 97838
node which has a flag. That flag in turn was used by an
already-selected adde which turned into an ADC32ri8 which
used a selected load which was chained to the load we
folded. This flag use caused us to form a cycle. Fix
this by not ignoring chains in IsLegalToFold even in
cases where the isel thinks it can.
llvm-svn: 97791
as nounwind are marked with a -1 call-site value. This is necessary to, for
example, correctly process exceptions thrown from within an "unexpected"
execption handler (see SingleSource/Regression/C++/EH/expection_spec_test.cpp).
llvm-svn: 97757
as the very last thing before node emission. This should
dramatically reduce the number of times we do 'MatchAddress'
on X86, speeding up compile time. This also improves comments
in the tables and shrinks the table a bit, now down to
80506 bytes for x86.
llvm-svn: 97703
CSE and recursive RAUW calls delete a node from the use list,
invalidating the use list iterator. There's currently no known
way to reproduce this in an unmodified LLVM, however there's no
fundamental reason why a SelectionDAG couldn't be formed which
would trigger this case.
llvm-svn: 97665
entry we're about to process is obviously going to fail, don't
bother pushing a scope only to have it immediately be popped.
This avoids a lot of scope stack traffic in common cases.
Unfortunately, this requires duplicating some of the predicate
dispatch. To avoid duplicating the actual logic I pulled each
predicate out to its own static function which gets used in
both places.
llvm-svn: 97651
SwitchOpcodeMatcher) and have DAGISelMatcherOpt form it. This
speeds up selection, particularly for X86 which has lots of
variants of instructions with only type differences.
llvm-svn: 97645
- Eliminate TargetInstrInfo::isIdentical and replace it with produceSameValue. In the default case, produceSameValue just checks whether two machine instructions are identical (except for virtual register defs). But targets may override it to check for unusual cases (e.g. ARM pic loads from constant pools).
llvm-svn: 97628
long test(long x) { return (x & 123124) | 3; }
Currently compiles to:
_test:
orl $3, %edi
movq %rdi, %rax
andq $123127, %rax
ret
This is because instruction and DAG combiners canonicalize
(or (and x, C), D) -> (and (or, D), (C | D))
However, this is only profitable if (C & D) != 0. It gets in the way of the
3-addressification because the input bits are known to be zero.
llvm-svn: 97616
CopyToReg/CopyFromReg/INLINEASM. These are annoying because
they have the same opcode before an after isel. Fix this by
setting their NodeID to -1 to indicate that they are selected,
just like what automatically happens when selecting things that
end up being machine nodes.
With that done, give IsLegalToFold a new flag that causes it to
ignore chains. This lets the HandleMergeInputChains routine be
the one place that validates chains after a match is successful,
enabling the new hotness in chain processing. This smarter
chain processing eliminates the need for "PreprocessRMW" in the
X86 and MSP430 backends and enables MSP to start matching it's
multiple mem operand instructions more aggressively.
I currently #if out the dead code in the X86 backend and MSP
backend, I'll remove it for real in a follow-on patch.
The testcase changes are:
test/CodeGen/X86/sse3.ll: we generate better code
test/CodeGen/X86/store_op_load_fold2.ll: PreprocessRMW was
miscompiling this before, we now generate correct code
Convert it to filecheck while I'm at it.
test/CodeGen/MSP430/Inst16mm.ll: Add a testcase for mem/mem
folding to make anton happy. :)
llvm-svn: 97596
was that we weren't properly handling the case when interior
nodes of a matched pattern become dead after updating chain
and flag uses. Now we handle this explicitly in
UpdateChainsAndFlags.
llvm-svn: 97561
DoInstructionSelection. Inline "SelectRoot" into it from DAGISelHeader.
Sink some other stuff out of DAGISelHeader into SDISel.
Eliminate the various 'Indent' stuff from various targets, which dates
to when isel was recursive.
17 files changed, 114 insertions(+), 430 deletions(-)
llvm-svn: 97555
stuff now that we don't care about emulating the old broken
behavior of the old isel. This eliminates the
'CheckChainCompatible' check (along with IsChainCompatible) which
did an incorrect and inefficient scan *up* the chain nodes which
happened as the pattern was being formed and does the validation
at the end in HandleMergeInputChains when it forms a structural
pattern. This scans "down" the graph, which means that it is
quickly bounded by nodes already selected. This also handles
token factors that get "trapped" in the dag.
Removing the CheckChainCompatible nodes also shrinks the
generated tables by about 6K for X86 (down to 83K).
There are two pieces remaining before I can nuke PreprocessRMW:
1. I xfailed a test because we're now producing worse code in a
case that has nothing to do with the change: it turns out that
our use of MorphNodeTo will leave dead nodes in the graph
which (depending on how the graph is walked) end up causing
bogus uses of chains and blocking matches. This is really
bad for other reasons, so I'll fix this in a follow-up patch.
2. CheckFoldableChainNode needs to be improved to handle the TF.
llvm-svn: 97539
ComplexPattern at the root be generated multiple times, once
for each opcode they are part of. This encourages factoring
because the opcode checks get treated just like everything
else in the matcher.
llvm-svn: 97439
to a scope where every child starts with a CheckOpcode, but
executes more efficiently. Enhance DAGISelMatcherOpt to
form it.
This also fixes a bug in CheckOpcode: apparently the SDNodeInfo
objects are not pointer comparable, we have to compare the
enum name.
llvm-svn: 97438
defs or uses. The regular def and use checking below covers them, and
can be more precise. It's safe to hoist an instruction with a dead
implicit def if the register isn't live into the loop header.
llvm-svn: 97352
for alignment into the LSDA. If the TType base offset is emitted, then put the
padding there. Otherwise, put it in the call site table length. There will be no
conflict between the two sites when placing the padding in one place.
llvm-svn: 97277
The PowerPC floating point registers can represent both f32 and f64 via the
two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to
allow cross-class coalescing. This coalescing only affects whether registers
are spilled as f32 or f64.
Spill slots must be accessed with load/store instructions corresponding to the
class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking
at the instruction opcode which is wrong.
X86 has similar floating point register classes, but doesn't try to fold
memory operands, so there is no problem there.
llvm-svn: 97262
the alignment requirement, if it no longer makes the TType base offset overflow
into extra bytes, then we need to pad to those bytes ourselves.
llvm-svn: 97196
will eliminate the need for padding in the "Call site table length". E.g., if
we have this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0x7f ## @TType base offset
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
with padding of 1. We want to emit the padding like this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0xff ## @TType base offset
.space 1,0 ## Padding
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
and not with padding on the "Call site table length" entry.
llvm-svn: 97183
terms of store and load, which means bitcasting between scalar
integer and vector has endian-specific results, which undermines
this whole approach.
llvm-svn: 97137
GCC_except_table label but before the Lexception, which the FDE references.
This causes problems as the FDE does not point to the start of an LSDA chunk.
Use an unnormalized uleb128 for the call-site table length that includes the
padding.
llvm-svn: 97078
the number of value bits, not the number of bits of allocation for in-memory
storage.
Make getTypeStoreSize and getTypeAllocSize work consistently for arrays and
vectors.
Fix several places in CodeGen which compute offsets into in-memory vectors
to use TargetData information.
This fixes PR1784.
llvm-svn: 97064
necessary to swap the operands to handle NaN and negative zero properly.
Also, reintroduce logic for checking for NaN conditions when forming
SSE min and max instructions, fixed to take into consideration NaNs and
negative zeros. This allows forming min and max instructions in more
cases.
llvm-svn: 97025
to adding them in a determinstic order (bottom up from
the root) based on the structure of the graph itself.
This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?
CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear. Since it
is an unreduced mass of gnast, I just removed it.
This fixes PR6370
llvm-svn: 97023
Previously, LiveIntervalAnalysis would infer phi joins by looking for multiply
defined registers. That doesn't work if the phi join is implicitly defined in
all but one of the predecessors.
llvm-svn: 96994
no id's would cause early exit allowing IsLegalToFold to return true
instead of false, producing a cyclic dag.
This was striking the new isel because it isn't using SelectNodeTo yet,
which theoretically is just an optimization.
llvm-svn: 96972
126.gcc nightly tests. These failures uncovered latent bugs that machine DCE
could remove one half of a stack adjust down/up pair, causing PEI to assert.
This update fixes that, and the tests now pass.
llvm-svn: 96822
dragonegg self-host build. I reverted 96640 in order to revert
96556 (96640 goes on top of 96556), but it also looks like with
both of them applied the breakage happens even earlier. The
symptom of the 96556 miscompile is the following crash:
llvm[3]: Compiling AlphaISelLowering.cpp for Release build
cc1plus: /home/duncan/tmp/tmp/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4982: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*, llvm::SelectionDAG::DAGUpdateListener*): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed.
Stack dump:
0. Running pass 'X86 DAG->DAG Instruction Selection' on function '@_ZN4llvm19AlphaTargetLowering14LowerOperationENS_7SDValueERNS_12SelectionDAGE'
g++: Internal error: Aborted (program cc1plus)
This occurs when building LLVM using LLVM built by LLVM (via
dragonegg). Probably LLVM has miscompiled itself, though it
may have miscompiled GCC and/or dragonegg itself: at this point
of the self-host build, all of GCC, LLVM and dragonegg were built
using LLVM. Unfortunately this kind of thing is extremely hard
to debug, and while I did rummage around a bit I didn't find any
smoking guns, aka obviously miscompiled code.
Found by bisection.
r96556 | evancheng | 2010-02-18 03:13:50 +0100 (Thu, 18 Feb 2010) | 5 lines
Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"
r96640 | evancheng | 2010-02-19 01:34:39 +0100 (Fri, 19 Feb 2010) | 16 lines
Transform (xor (setcc), (setcc)) == / != 1 to
(xor (setcc), (setcc)) != / == 1.
e.g. On x86_64
%0 = icmp eq i32 %x, 0
%1 = icmp eq i32 %y, 0
%2 = xor i1 %1, %0
br i1 %2, label %bb, label %return
=>
testl %edi, %edi
sete %al
testl %esi, %esi
sete %cl
cmpb %al, %cl
je LBB1_2
llvm-svn: 96672
for ARM to just check if a function has a FP to determine if it's safe
to simplify the stack adjustment pseudo ops prior to eliminating frame
indices. Allow targets to override the default behavior and does so for ARM
and Thumb2.
llvm-svn: 96634
Moderate the weight given to very small intervals.
The spill weight given to new intervals created when spilling was not
normalized in the same way as the original spill weights calculated by
CalcSpillWeights. That meant that restored registers would tend to hang around
because they had a much higher spill weight that unspilled registers.
This improves the runtime of a few tests by up to 10%, and there are no
significant regressions.
llvm-svn: 96613
checking whether AnalyzeBranch disagrees with the CFG
directly, rather than looking for EH_LABEL instructions.
EH_LABEL instructions aren't always at the end of the
block, due to FP_REG_KILL and other things. This fixes
an infinite loop compiling MultiSource/Benchmarks/Bullet.
llvm-svn: 96611
A virtual register can be used before it is defined in the same MBB if the MBB
is part of a loop. Teach the implicit-def pass about this case.
llvm-svn: 96279
IsLegalToFold and IsProfitableToFold. The generic version of the later simply checks whether the folding candidate has a single use.
This allows the target isel routines more flexibility in deciding whether folding makes sense. The specific case we are interested in is folding constant pool loads with multiple uses.
llvm-svn: 96255
When coalescing with a physreg, remember to add imp-def and imp-kill when
dealing with sub-registers.
Also fix a related bug in VirtRegRewriter where substitutePhysReg may
reallocate the operand list on an instruction and invalidate the reg_iterator.
This can happen when a register is mentioned twice on the same instruction.
llvm-svn: 96072
created. This ensures it's updated at all time. It means targets which perform
dynamic stack alignment would know whether it is required and whether frame
pointer register cannot be made available register allocation.
This is a fix for rdar://7625239. Sorry, I can't create a reasonably sized test
case.
llvm-svn: 96069
phi cycles. Adjust a few tests to keep dead instructions from being optimized
away. This (together with my previous change for phi cycles) fixes Apple
radar 7627077.
llvm-svn: 96057
bug fixes, and with improved heuristics for analyzing foreign-loop
addrecs.
This change also flattens IVUsers, eliminating the stride-oriented
groupings, which makes it easier to work with.
llvm-svn: 95975
* Enabled R1/R2 application for nodes with infinite spill costs in the Briggs heuristic (made
safe by the changes to the normalization proceedure).
* Removed a redundant header.
llvm-svn: 95973
reduce down to a single value. InstCombine already does this transformation
but DAG legalization may introduce new opportunities. This has turned out to
be important for ARM where 64-bit values are split up during type legalization:
InstCombine is not able to remove the PHI cycles on the 64-bit values but
the separate 32-bit values can be optimized. I measured the compile time
impact of this (running llc on 176.gcc) and it was not significant.
llvm-svn: 95951
Calling RemoveOperand is very expensive on huge PHI instructions. This makes
early tail duplication run twice as fast on the Firefox JavaScript
interpreter.
llvm-svn: 95832
lowering and requires that certain types exist in ValueTypes.h. Modified widening to
check if an op can trap and if so, the widening algorithm will apply only the op on
the defined elements. It is safer to do this in widening because the optimizer can't
guarantee removing unused ops in some cases.
llvm-svn: 95823
legalization even when the IR-level optimizer has removed dead phis, such
as when the high half of an i64 value is unused on a 32-bit target.
I had to adjust a few test cases that had dead phis.
This is a partial fix for Radar 7627077.
llvm-svn: 95816
register coalescing. This fixes many crashes and
places where debug info affects codegen (when
dbg.value is lowered to machine instructions, which
it isn't yet in TOT).
llvm-svn: 95739
The major win of this is that the code is simpler and they
print on the same line as the instruction again:
movl %eax, 96(%esp) ## 4-byte Spill
movl 96(%esp), %eax ## 4-byte Reload
cmpl 92(%esp), %eax ## 4-byte Folded Reload
jl LBB7_86
llvm-svn: 95738
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
Previously spill registers, whose def indexes are not defined, would sometimes be improperly marked as coalescable with conflicting registers. The new findCoalesces routine conservatively assumes that any register with at least one undefined def is not coalescable with any register it interferes with.
llvm-svn: 95636
in global initializers. Instead of aborting, attempt to fold them on the
spot. If folding succeeds, emit the folded expression instead.
This fixes PR6255.
llvm-svn: 95583
warns about this base class not having a virtual destructor, but since
this class has no virtual methods and neither it or the types derived
from it has a destructor, a protected trivial destructor will do (and
shuts cppcheck up) the trick without the cost of introducing a vtable.
llvm-svn: 95526
only run for x86 with fastisel. I've found it being very effective in
eliminating some obvious dead code as result of formal parameter lowering
especially when tail call optimization eliminated the need for some of the loads
from fixed frame objects. It also shrinks a number of the tests. A couple of
tests no longer make sense and are now eliminated.
llvm-svn: 95493
following it. However, the EmitGlobalConstant method wasn't emitting a body for
the constant. The assembler doesn't like that. Before, we were generating this:
.zerofill __DATA, __common, __cmd, 1, 3
This fix puts us back to that semantic.
llvm-svn: 95336
the end of the instruction instead of expecting the caller to
do it. This currently causes the asm-verbose instruction
comments to be on the next line.
llvm-svn: 95178
than DEBUG_VALUE :( ) into the target indep AsmPrinter.cpp
file. This allows elimination of the
NO_ASM_WRITER_BOILERPLATE hack among other things.
llvm-svn: 95177
stderr if in filetype=obj mode. This is a hack, and will
live until dwarf emission and other random stuff that is
not yet going through MCStreamer is upgraded. It only
impacts filetype=obj mode.
llvm-svn: 95166
$ cat t.ll
@g = global i32 42
$ llc t.ll -o t.o -filetype=obj
$ nm t.o
00000000 D _g
There is still a ton of work left. Instructions are not being encoded
yet apparently.
llvm-svn: 95162
the one used by the JIT. Remove all forms of
addPassesToEmitFileFinish except the one used by the static
code generator. Inline the remaining version of
addPassesToEmitFileFinish into its only caller.
llvm-svn: 95109
type is the same as the element type of the vector. EXTRACT_VECTOR_ELT can
be used to extended the width of an integer type. This fixes a bug for
Generic/vector-casts.ll on a ppc750.
llvm-svn: 94990
"visit*" method is called, take the newly created nodes, walk them in a DFS
fashion, and if they don't have an ordering set, then give it one.
llvm-svn: 94757
This allows code gen and the exception table writer to cooperate to make sure
landing pads are associated with the correct invoke locations.
llvm-svn: 94726
Move the X86 implementation of function body emission up to
AsmPrinter::EmitFunctionBody, which works by calling the virtual
EmitInstruction method.
llvm-svn: 94716
Target independent isel should always pass along the "tail call" property. Change
target hook LowerCall's parameter "isTailCall" into a refernce. If the target
decides it's impossible to honor the tail call request, it should set isTailCall
to false to make target independent isel happy.
llvm-svn: 94626
change is that we now use ".linkonce discard" for global variables
instead of ".linkonce samesize". These should be the same, just less
strict. If anyone is interested in mcizing MCSection for COFF targets,
this should be easy to fix.
llvm-svn: 94623
Default HasSetDirective to true, since most targets have it.
The targets that claim to not have it probably do, or it is
spelled differently. These include Blackfin, Mips, Alpha, and
PIC16. All of these except pic16 are normal ELF targets, so
they almost certainly have it.
llvm-svn: 94585
which is more convenient, and change getPICJumpTableRelocBaseExpr
to take a MachineFunction to match.
Next, move the X86 code that create a PICBase symbol to
X86TargetLowering::getPICBaseSymbol from
X86MCInstLower::GetPICBaseSymbol, which was an asmprinter specific
library. This eliminates a 'gross hack', and allows us to
implement X86ISelLowering::getPICJumpTableRelocBaseExpr which now
calls it.
This in turn allows us to eliminate the
X86AsmPrinter::printPICJumpTableSetLabel method, which was the
only overload of printPICJumpTableSetLabel.
llvm-svn: 94526
EK_LabelDifference32 kind and the target has .set support. Simplify
X86AsmPrinter::printPICJumpTableSetLabel to make use of recent helpers.
llvm-svn: 94518
* Fixed a reduction bug which occasionally led to infinite-cost (invalid)
register allocation solutions despite the existence finite-cost solutions.
* Significantly reduced memory usage (>50% reduction).
* Simplified a lot of the solver code.
llvm-svn: 94514
MachineFunctionAnalysis dole them out, instead of having
AsmPrinter do both. Have the AsmPrinter::SetupMachineFunction
method set the 'AsmPrinter::MF' variable.
llvm-svn: 94509