FPROUND_F80_F32, FPROUND_PPCF128_F32,
FPROUND_F80_F64, FPROUND_PPCF128_F64
Support for soften float fp_round operands is added, Mips
needs this to round f64->f32.
Also added support to soften float FABS result, Mips doesn't
support double fabs results while in 'single float only' mode.
llvm-svn: 54484
LowerSubregs, and fix an x86-64 isel bug that this exposed.
SUBREG_TO_REG for x86-64 implicit zero extension is only safe for
isel to generate when the source is known to always have zeros in
the high 32 bits. The EXTRACT_SUBREG instruction does not clear
the high 32 bits.
llvm-svn: 54444
- Add a basic machine-level dead block eliminator.
These two have to go together, since many other parts of the code generator are unable to handle the unreachable blocks otherwise created.
llvm-svn: 54333
switches use the binary search algorithm) for
environments that don't support it. PPC64 JIT
is such an environment; turn the flag on for that.
llvm-svn: 54248
a new ilist_node class, and remove them. Unlike alist_node,
ilist_node doesn't attempt to manage storage itself, so it avoids
the associated problems, including being opaque in gdb.
Adjust the Recycler class so that it doesn't depend on alist_node.
Also, change it to use explicit Size and Align parameters, allowing
it to work when the largest-sized node doesn't have the greatest
alignment requirement.
Change MachineInstr's MachineMemOperand list from a pool-backed
alist to a std::list for now.
llvm-svn: 54146
is to have StrongPHIElimination use its knowledge of the PHIs before they're erased to update the intervals appropriate. This is
both simpler and more accurate than the alternative, which was having LIA figure it out when it renumbered things, plus it's just
the right thing to do!
llvm-svn: 54077
by the PHI needs to be extended to the beginning of its basic block, and the intervals that were inputs need to be trimmed to the end
of their basic blocks.
llvm-svn: 54070
and knowledge of PseudoSourceValues. This unfortunately isn't sufficient to allow
constants to be rematerialized in PIC mode -- the extra indirection is a
complication.
llvm-svn: 54000
to multiply the instruction count by a constant factor in a few places, which
caused the register allocator to require many more iterations.
llvm-svn: 53959
Remove the GetResultInst instruction. It is still accepted in LLVM assembly
and bitcode, where it is now auto-upgraded to ExtractValueInst. Also, remove
support for return instructions with multiple values. These are auto-upgraded
to use InsertValueInst instructions.
The IRBuilder still accepts multiple-value returns, and auto-upgrades them
to InsertValueInst instructions.
llvm-svn: 53941
SelectionDAG graph writer to make use of them. Now, nodes with multiple
values are displayed as such, with incoming edges pointing to the
specific value they use.
llvm-svn: 53875
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor). Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check". My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!
llvm-svn: 53850
If .loc and .file aren't used, always emit the "debug_line" section. This
requires at least one entry in the line matrix. So if there's nothing to emit
into the matrix, emit an end of matrix value anyway.
llvm-svn: 53803
the night realising that it was wrong :) I
think the reason the same type was being used
for the shufflevec of indices as for the actual
indices is so that if one of them needs splitting
then so does the other. After my patch it might
be that the indices need splitting but not the
rest, yet there is no good way of handling that.
I think the right solution is to not have the
shufflevec be an operand at all: just have it
be the list of numbers it actually is, stored
as extra info in the node.
llvm-svn: 53768
PseudoSourceValue values, which never have names. Use getName()
for all other values, because we want to print just a short summary
of the value, not the entire instruction.
llvm-svn: 53738
mask. These are just indices into the shuffled vector
so their type is unrelated to the type of the
shuffled elements (which is what was being used before).
This fixes vec_shuffle-11.ll when using LegalizeTypes.
What seems to have happened is that Dan's recent change
r53687, which corrected the result type of the shuffle,
somehow caused LegalizeTypes to notice that the mask
operand was a BUILD_VECTOR with a legal type but elements
of an illegal type (i64). LegalizeTypes legalized this
by introducing a new BUILD_VECTOR of i32 and bitcasting
it to the old type. But the mask operand is not supposed
to be a bitcast but a straight BUILD_VECTOR of constants,
causing a crash.
llvm-svn: 53729
replacement of multiple values. This is slightly more efficient
than doing multiple ReplaceAllUsesOfValueWith calls, and theoretically
could be optimized even further. However, an important property of this
new function is that it handles the case where the source value set and
destination value set overlap. This makes it feasible for isel to use
SelectNodeTo in many very common cases, which is advantageous because
SelectNodeTo avoids a temporary node and it doesn't require CSEMap
updates for users of values that don't change position.
Revamp MorphNodeTo, which is what does all the work of SelectNodeTo, to
handle operand lists more efficiently, and to correctly handle a number
of corner cases to which its new wider use exposes it.
This commit also includes a change to the encoding of post-isel opcodes
in SDNodes; now instead of being sandwiched between the target-independent
pre-isel opcodes and the target-dependent pre-isel opcodes, post-isel
opcodes are now represented as negative values. This makes it possible
to test if an opcode is pre-isel or post-isel without having to know
the size of the current target's post-isel instruction set.
These changes speed up llc overall by 3% and reduce memory usage by 10%
on the InstructionCombining.cpp testcase with -fast and -regalloc=local.
llvm-svn: 53728
In LegalizeDAG the value is zero-extended to
the new type before byte swapping. It doesn't
matter how the extension is done since the new
bits are shifted off anyway after the swap, so
extend by any old rubbish bits. This results
in the final assembler for the testcase being
one line shorter.
llvm-svn: 53604
extending load of a vector. Handle this case when
splitting vector loads. I'm not completely sure
what is supposed to happen, but I think it means
hi should be set to undef. LegalizeDAG does not
consider this case.
llvm-svn: 53555
stores of one-element vectors. Also, neaten the
handling of INSERT_VECTOR_ELT when the inserted
type is larger than the vector element type.
llvm-svn: 53554
are used for passing huge immediates in inline ASM
from the front-end straight down to the ASM writer.
Of course this is a hack, but it is simple, limited
in scope, works in practice, and is what LegalizeDAG
does.
llvm-svn: 53553
8 %reg1024<def> = IMPLICIT_DEF
12 %reg1024<def> = INSERT_SUBREG %reg1024<kill>, %reg1025, 2
The live range [12, 14) are not part of the r1024 live interval since it's defined by an implicit def. It will not conflicts with live interval of r1025. Now suppose both registers are spilled, you can easily see a situation where both registers are reloaded before the INSERT_SUBREG and both target registers that would overlap.
llvm-svn: 53503
SINT_TO_FP libcall plus additional operations:
it might as well be a direct UINT_TO_FP libcall.
So only turn it into an SINT_TO_FP if the target
has special handling for SINT_TO_FP.
llvm-svn: 53461
Lack of these caused a bootstrap failure with Fortran
on x86-64 with LegalizeTypes turned on. While there,
be nice to 16 bit machines and support expansion of
i32 too.
llvm-svn: 53408
makes their special-case checks of use_size() less beneficial,
so remove them. This eliminates all but one use of use_size(),
which is in AssignTopologicalOrder, which uses it only once for
each node, and so can reasonably afford to recompute it, as
this allows the UsesSize field of SDNode to be removed
altogether.
llvm-svn: 53377
class, and store IsVolatile and Alignment in a more compact form.
This makes AtomicSDNode slightly larger, but it shrinks LoadSDNode
and StoreSDNode, which are much more common and are the largest of
the SDNode subclasses. Also, this lets the isVolatile() and
getAlignment() accessors be non-virtual.
llvm-svn: 53361
SINT_TO_FP and UINT_TO_FP. This now produces
the same code as LegalizeDAG (the previous
code was based on a mistaken idea of what
LegalizeDAG did in this case).
llvm-svn: 53288
soft float: experiments show that targets aren't
expecting this for results or for operands. Add
support select/select_cc result soft float and
correct operand soft float for these.
llvm-svn: 53245
MachineMemOperands. The pools are owned by MachineFunctions.
This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.
llvm-svn: 53212
and reused across SelectionDAGs.
This drastically reduces the number of calls to malloc/free made during
instruction selection, and improves memory locality.
llvm-svn: 53211
simple const SDOperand*, which is what's usually needed.
For AddNodeIDOperands, which is small, just duplicate the function to
accept an SDUse*.
For SelectionDAG::getNode - Add an overload that accepts SDUse* that
copies the operands into a temporary SDOperand array, but also has
special-case checks for 0 through 3 operands to avoid the copy in
the common cases.
llvm-svn: 53183
that fixed problems in EmitStackConvert where the source and target type
have different alignment by creating a stack slot with the max
alignment of source and target type.
llvm-svn: 53150
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes. No change if the
new LegalizeTypes infrastructure is not turned on.
llvm-svn: 53137
moves in order to get correct debug info. Since
I can't imagine how any target could possibly
be any different, I've just stripped out the
option: now all the world's like Darwin!
llvm-svn: 53134
SelectionDAG::SelectNodeTo in the instruction selector. This
updates existing nodes in place instead of creating new ones.
Go back to selecting ISD::DBG_LABEL nodes into
TargetInstrInfo::DBG_LABEL nodes instead of leaving them
unselected, now that SelectNodeTo allows us to update them
in place.
llvm-svn: 53057
to be passed the list of value types, and use this
where appropriate. Inappropriate places are where
the value type list is already known and may be
long, in which case the existing method is more
efficient.
llvm-svn: 53035
the need for a flavor operand, and add a new SDNode subclass,
LabelSDNode, for use with them to eliminate the need for a label id
operand.
Change instruction selection to let these label nodes through
unmodified instead of creating copies of them. Teach the MachineInstr
emitter how to emit a MachineInstr directly from an ISD label node.
This avoids the need for allocating SDNodes for the label id and
flavor value, as well as SDNodes for each of the post-isel label,
label id, and label flavor.
llvm-svn: 52943
SelectionDAG::allnodes_size is linear, but that doesn't appear to
outweigh the benefit of reducing heap traffic. If it does become a
problem, we should teach SelectionDAG to keep a count of how many
nodes are live, because there are several other places where that
information would be useful as well.
llvm-svn: 52926
purpose, and give it a custom SDNode subclass so that it doesn't
need to have line number, column number, filename string, and
directory string, all existing as individual SDNodes to be the
operands.
This was the only user of ISD::STRING, StringSDNode, etc., so
remove those and some associated code.
This makes stop-points considerably easier to read in
-view-legalize-dags output, and reduces overhead (creating new
nodes and copying std::strings into them) on code containing
debugging information.
llvm-svn: 52924
SmallVectors. Change the signature of TargetLowering::LowerArguments
to avoid returning a vector by value, and update the two targets
which still use this directly, Sparc and IA64, accordingly.
llvm-svn: 52917
only needs one bit for each register. UsedRegs is a SmallVector
sized at 16, so this eliminates a heap allocation/free for every
call and return processed by Legalize on most targets.
llvm-svn: 52915
it impossible to create a MERGE_VALUES node with
only one result: sometimes it is useful to be able
to create a node with only one result out of one of
the results of a node with more than one result, for
example because the new node will eventually be used
to replace a one-result node using ReplaceAllUsesWith,
cf X86TargetLowering::ExpandFP_TO_SINT. On the other
hand, most users of MERGE_VALUES don't need this and
for them the optimization was valuable. So add a new
utility method getMergeValues for creating MERGE_VALUES
nodes which by default performs the optimization.
Change almost everywhere to use getMergeValues (and
tidy some stuff up at the same time).
llvm-svn: 52893
Move GetConstantStringInfo to lib/Analysis. Remove
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
This unbreaks llvm-gcc bootstrap.
llvm-svn: 52884
<16 x float> is 64-byte aligned (for some reason),
which gets us into the stack realignment code. The
computation changing FP-relative offsets to SP-relative
was broken, assiging a spill temp to a location
also used for parameter passing. This
fixes it by rounding up the stack frame to a multiple
of the largest alignment (I concluded it wasn't fixable
without doing this, but I'm not very sure.)
llvm-svn: 52750
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
llvm-svn: 52748
For this it is convenient to permit floats to
be used with EXTRACT_ELEMENT, so I tweaked
things to allow that. I also added libcalls
for ppcf128 to i32 forms of FP_TO_XINT, since
they exist in libgcc and this case can certainly
occur (and does occur in the testsuite) - before
the i64 libcall was being used. Also, the
XINT_TO_FP result seemed to be wrong when
the argument is an i128: the wrong fudge
factor was added (the i32 and i64 cases were
handled directly, but the i128 code fell
through to some generic softening code which
seemed to think it was i64 to f32!). So I
fixed it by adding a fudge factor that I
found in my breakfast cereal.
llvm-svn: 52739
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub
llvm-svn: 52706
,------.
| |
| v
| t2 = phi ... t1 ...
| |
| v
| t1 = ...
| ... = ... t1 ...
| |
`------'
where there is a use in a PHI node that's a predecessor to the defining
block. We don't want to mark all predecessors as having the value "alive" in
this case. Also, the assert was too restrictive and didn't handle this case.
llvm-svn: 52655
fixes PR2476; patch by Richard Osborne. The same
problem exists for a bunch of other operators, but
I'm ignoring this because they will be automagically
fixed when the new LegalizeTypes infrastructure lands,
since it already solves this problem centrally.
llvm-svn: 52610
to DenseMap<SDNode*, SUnit*>, and adjust the way cloned SUnit nodes are
handled so that only the original node needs to be in the map.
This speeds up llc on 447.dealII.llvm.bc by about 2%.
llvm-svn: 52576
integer of the same type. Before it was "promotion",
but this is confusing because it is quite different
to promotion of integers. Call it "softening" instead,
inspired by "soft float".
llvm-svn: 52546
According to DWARF-2 specification, the line information is provided through an offset in the .debug_line section.
Replace the label reference that is used with a section offset.
llvm-svn: 52468
rather than bundling them together. Rename FloatToInt
to PromoteFloat (better, if not perfect). Reorganize
files by types rather than by operations.
llvm-svn: 52408
of value info (sign/zero ext info) from one MBB to another. This doesn't
handle much right now because of two limitations:
1) only handles zext/sext, not random bit propagation (no assert exists
for this)
2) doesn't handle phis.
llvm-svn: 52383
still excluding types like i1 (not byte sized)
and i120 (loading an i120 requires loading an i64,
an i32, an i16 and an i8, which is expensive).
llvm-svn: 52310
wrong for volatile loads and stores. In fact this
is almost all of them! There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access. These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used. Consider
loading an i32 but only using the lower 8 bits. It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes. It
is also unwise to make a load/store wider. For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects. (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware. (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several. For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores). In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic. My policy here is
to say that the number of processor operations for
an illegal operation is undefined. So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok. It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal! That is because
operations are marked legal by default, regardless of
whether the type is legal or not. In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation. However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal. So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before. This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.
llvm-svn: 52254
maps can be deleted. This happens when RAUW
replaces a node N with another equivalent node
E, deleting the first node. Solve this by
adding (N, E) to ReplacedNodes, which is already
used to remap nodes to replacements. This means
that deleted nodes are being allowed in maps,
which can be delicate: the memory may be reused
for a new node which might get confused with the
old deleted node pointer hanging around in the
maps, so detect this and flush out maps if it
occurs (ExpungeNode). The expunging operation
is expensive, however it never occurs during
a llvm-gcc bootstrap or anywhere in the nightly
testsuite. It occurs three times in "make check":
Alpha/illegal-element-type.ll,
PowerPC/illegal-element-type.ll and
X86/mmx-shift.ll. If expunging proves to be too
expensive then there are other more complicated
ways of solving the problem.
In the normal case this patch adds the overhead
of a few more map lookups, which is hopefully
negligable.
llvm-svn: 52214
of integer types. Fix the isMask APInt method to
actually work (hopefully) rather than crashing
because it adds apints of different bitwidths.
It looks like isShiftedMask is also broken, but
I'm leaving that one to the APInt people (it is
not used anywhere).
llvm-svn: 52142
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits. Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.
llvm-svn: 52098
no visible functionality change, but enables a future patch where node creation
will update the CFG if it decides to create an unconditional rather than a conditional branch.
llvm-svn: 52067
and better control the abstraction. Rename the type
to MVT. To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits(). Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).
llvm-svn: 52044
are the same as in unpacked structs, only field
positions differ. This only matters for structs
containing x86 long double or an apint; it may
cause backwards compatibility problems if someone
has bitcode containing a packed struct with a
field of one of those types.
The issue is that only 10 bytes are needed to
hold an x86 long double: the store size is 10
bytes, but the ABI size is 12 or 16 bytes (linux/
darwin) which comes from rounding the store size
up by the alignment. Because it seemed silly not
to pack an x86 long double into 10 bytes in a
packed struct, this is what was done. I now
think this was a mistake. Reserving the ABI size
for an x86 long double field even in a packed
struct makes things more uniform: the ABI size is
now always used when reserving space for a type.
This means that developers are less likely to
make mistakes. It also makes life easier for the
CBE which otherwise could not represent all LLVM
packed structs (PR2402).
Front-end people might need to adjust the way
they create LLVM structs - see following change
to llvm-gcc.
llvm-svn: 51928
instruction to execute. This can be used for transformations (like two-address
conversion) to remat an instruction instead of generating a "move"
instruction. The idea is to decrease the live ranges and register pressure and
all that jazz.
llvm-svn: 51660
Running /Users/void/llvm/llvm.src/test/CodeGen/X86/dg.exp ...
FAIL: /Users/void/llvm/llvm.src/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
Failed with exit(1) at line 1
while running: llvm-as < /Users/void/llvm/llvm.src/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll | llc -march=x86 -mattr=+sse2 -stats |& grep {1 .*folded into instructions}
child process exited abnormally
Make this conditional for now.
llvm-svn: 51563
live interval to infinity if the instruction being rewritten is an
original remat def instruction. We were only checking against the clone
of the remat def which doesn't actually appear in the IR at all.
llvm-svn: 51440
BB1:
vr1025 = copy vr1024
..
BB2:
vr1024 = op
= op vr1025
<loop eventually branch back to BB1>
Even though vr1025 is copied from vr1024, it's not safe to coalesced them since live range of vr1025 intersects the def of vr1024. This happens when vr1025 is assigned the value of the previous iteration of vr1024 in the loop.
llvm-svn: 51394
If local spiller optimization turns some instruction into an identity copy, it will be removed. If the output register happens to be dead (and source is obviously killed), transfer the kill / dead information to last use / def in the same MBB.
llvm-svn: 51306
are represented as "weak", but there are subtle differences
in some cases on Darwin, so we need both. The intent
is that "common" will behave identically to "weak" unless
somebody changes their target to do something else.
No functional change as yet.
llvm-svn: 51118
address of the PassInfo directly instead of calling getPassInfo.
This eliminates a bunch of dynamic initializations of static data.
Also, fold RegisterPassBase into PassInfo, make a bunch of its
data members const, and rearrange some code to initialize data
members in constructors instead of using setter member functions.
llvm-svn: 51022
Darwin. This is a hack of course, but it does
at least look at the right thing: gotpcrel means
that this is already an offset, so an explicit
offset is not needed (and wrong). I think this
is good enough for the moment: Anton is working
on something better.
llvm-svn: 50850
on x86-64 linux. This causes no regressions on
32 bit linux and 32 bit ppc. More tests pass
on 64 bit ppc with no regressions. I didn't
turn on eh on 64 bit linux because the intrinsics
needed to compile the eh runtime aren't done
yet. But if you turn it on and link with the
mainline runtime then eh seems to work fine
on x86-64 linux with this patch. Thanks to
Dale for testing. The main point of the patch
is that if you output that some object is
encoded using 4 bytes you had better not output
8 bytes for it: the patch makes everything
consistent.
llvm-svn: 50825
%ecx = op
store %cl<kill>, (addr)
(addr) = op %al
It's not safe to unfold the last operand and eliminate store even though %cl is marked kill. It's a sub-register use which means one of its super-register(s) may be used below.
llvm-svn: 50794
Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.
Initial PowerPC tail call implementation:
Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if
caller and callee are fastcc
call is a tail call (in tail call position, call followed by ret)
no variable argument lists or byval arguments
option -tailcallopt is enabled
Supported:
* non pic tail calls on linux/darwin
* module-local tail calls on linux(PIC/GOT)/darwin(PIC)
* inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.
A test checking the argument lowering behaviour on x86-64 was added.
llvm-svn: 50477
This removes the existing bottleneck related to the removal of elements from
the middle of the queue.
Also fixes a subtle bug in ScheduleDAGRRList::CapturePred:
It was updating the state of the SUnit before removing it. As a result, the
comparison operators were working incorrectly and this SUnit could not be removed
from the queue properly.
Reviewed by Evan and Dan. Approved by Dan.
llvm-svn: 50412
We now compile test2/test3 to:
_test2:
## InlineAsm Start
set %xmm0, %xmm1
## InlineAsm End
addps %xmm1, %xmm0
ret
_test3:
## InlineAsm Start
set %xmm0, %xmm1
## InlineAsm End
paddd %xmm1, %xmm0
ret
as expected.
llvm-svn: 50389
towards PR2094. It now compiles the attached .ll file to:
_sad16_sse2:
movslq %ecx, %rax
## InlineAsm Start
%ecx %rdx %rax %rax %r8d %rdx %rsi
## InlineAsm End
## InlineAsm Start
set %eax
## InlineAsm End
ret
which is pretty decent for a 3 output, 4 input asm.
llvm-svn: 50386
e.g.
vr1024<2> extract_subreg vr1025, 2
If vr1024 do not have the same register class as vr1025, it's not safe to coalesce this away. For example, vr1024 might be a GPR32 while vr1025 might be a GPR64.
llvm-svn: 50385
c1, f1 = CopyToReg
c2, f2 = CopyToReg
c3 = TokenFactor c1, c2
...
= user c3, ..., f2
Now that the two CopyToReg's and the user are "flagged" together. They effectively forms a single scheduling unit. The TokenFactor is now both an operand and a successor of the Flagged nodes.
llvm-svn: 50376
ComputeMaskedBits knows about cttz, ctlz, and ctpop. Teach
SelectionDAG's ComputeMaskedBits what InstCombine's knows
about SRem. And teach them both some things about high bits
in Mul, UDiv, URem, and Sub. This allows instcombine and
dagcombine to eliminate sign-extension operations in
several new cases.
llvm-svn: 50358
conversion open the door for many nasty implicit conversion issues, and
can be easily solved by initializing with (V.begin(), V.end()) when
needed.
This patch includes many small cleanups for sdisel also.
llvm-svn: 50340
When choosing between constraints with multiple options,
like "ir", test to see if we can use the 'i' constraint and
go with that if possible. This produces more optimal ASM in
all cases (sparing a register and an instruction to load it),
and fixes inline asm like this:
void test () {
asm volatile (" %c0 %1 " : : "imr" (42), "imr"(14));
}
Previously we would dump "42" into a memory location (which
is ok for the 'm' constraint) which would cause a problem
because the 'c' modifier is not valid on memory operands.
Isn't it great how inline asm turns 'missed optimization'
into 'compile failed'??
Incidentally, this was the todo in
PowerPC/2007-04-24-InlineAsm-I-Modifier.ll
Please do NOT pull this into Tak.
llvm-svn: 50315
- Make targetlowering.h fit in 80 cols.
- Make LowerAsmOperandForConstraint const.
- Make lowerXConstraint -> LowerXConstraint
- Make LowerXConstraint return a const char* instead of taking a string byref.
llvm-svn: 50312
to the block that defines their operands. This doesn't work in the
case that the operand is an invoke, because invoke is a terminator
and must be the last instruction in a block.
Replace it with support in SelectionDAGISel for copying struct values
into sequences of virtual registers.
llvm-svn: 50279
rather than having it suck them out of a node. Add
a bunch of new libcalls, and remove dead softfloat
code (dead, because FloatToInt is used not Expand
in this case). Note that indexed stores probably
aren't handled properly, likewise for loads.
llvm-svn: 49915
Rename SDOperandImpl back to SDOperand.
Introduce the SDUse class that represents a use of the SDNode referred by
an SDOperand. Now it is more similar to Use/Value classes.
Patch is approved by Dan Gohman.
llvm-svn: 49795
ScheduleDAG; they don't correspond to any actual instructions so they
don't need to be scheduled.
This fixes a bug where the EntryToken was being scheduled multiple
times in some cases, though it ended up not causing any trouble because
EntryToken doesn't expand into anything. With this fixed the schedulers
reliably schedule the expected number of units, so we can check this
with an assertion.
This requires a tweak to test/CodeGen/X86/loop-hoist.ll because it
ends up getting scheduled differently in a trivial way, though it was
enough to fool the prcontext+grep that the test does.
llvm-svn: 49701
much simpler than in LegalizeDAG because calls are
not yet expanded into call sequences: that happens
after type legalization has finished.
llvm-svn: 49634
in its maps. Add some sanity checks that catch
this kind of thing. Hopefully these can be
removed one day (once all problems are fixed!)
but for the moment it seems wise to have them in.
llvm-svn: 49612
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.
Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.
This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.
Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.
This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.
llvm-svn: 49572
before an invoke. Failure to do this causes references in
the landing pad to variables that were not set. Fixes
g++.dg/eh/delayslot1.C
g++.dg/eh/fp-regs.C
g++.old-deja/g++.brendan/eh1.C
llvm-svn: 49243
There is no point in creating a long live range defined by an implicit_def. Scheduler now duplicates implicit_def instruction for each of its uses. Therefore, if an implicit_def node has multiple uses, it will become a number of very short live ranges, rather than a long one. This will make coalescer's job easier.
llvm-svn: 49164
review feedback.
-enable-eh is still accepted but doesn't do anything.
EH intrinsics use Dwarf EH if the target supports that,
and are handled by LowerInvoke otherwise.
The separation of the EH table and frame move data is,
I think, logically figured out, but either one still
causes full EH info to be generated (not sure how to
split the metadata correctly).
MachineModuleInfo::needsFrameInfo is no longer used and
is removed.
llvm-svn: 49064
not marked nounwind, or for all functions when -enable-eh
is set, provided the target supports Dwarf EH.
llvm-gcc generates nounwind in the right places; other FEs
will need to do so also. Given such a FE, -enable-eh should
no longer be needed.
llvm-svn: 49006
In order to handle indexed nodes I had to introduce
a new constructor, and since I was there I factorized
the code in the various load constructors.
llvm-svn: 48894
nodes. This doesn't currently have much impact the generated code, but it
does produce simpler-looking SelectionDAGs, and consequently
simpler-looking ScheduleDAGs, because there are fewer spurious
dependencies.
In particular, CopyValueToVirtualRegister now uses the entry node as the
input chain dependency for new CopyToReg nodes instead of calling getRoot
and depending on the most recent memory reference.
Also, rename UnorderedChains to PendingExports and pull it up from being
a local variable in SelectionDAGISel::BuildSelectionDAG to being a
member variable of SelectionDAGISel, so that it doesn't have to be
passed around to all the places that need it.
llvm-svn: 48893
called LimitedSumOfUnscheduledPredsOfSuccs. It terminates the computation
after a given treshold is reached. This new function is always faster, but
brings real wins only on bigger test-cases.
The old function SumOfUnscheduledPredsOfSuccs is left in-place for now and therefore a warning about an unused static function is produced.
llvm-svn: 48872
LLVM Value/Use does and MachineRegisterInfo/MachineOperand does.
This allows constant time for all uses list maintenance operations.
The idea was suggested by Chris. Reviewed by Evan and Dan.
Patch is tested and approved by Dan.
On normal use-cases compilation speed is not affected. On very big basic
blocks there are compilation speedups in the range of 15-20% or even better.
llvm-svn: 48822
This fixes Bugzilla #1835 (http://llvm.org/bugs/show_bug.cgi?id=1835).
This patched is reviewed by Tanya and Dan. Dan tested and approved it.
The reason for the bad performance of the old algorithm is that it is very naive and scans every
time all nodes of the DAG in the worst case.
This patch introduces a new algorithm based on the paper "Online algorithms
for maintaining the topological order of a directed acyclic graph" by
David J.Pearce and Paul H.J.Kelly. This is the MNR algorithm. It has a
linear time worst-case and performs much better in most situations.
The paper can be found here:
http://fano.ics.uci.edu/cites/Document/Online-algorithms-for-maintaining-the-topological-order-of-a-directed-acyclic-graph.html
The main idea of the new algorithm is to compute the topological ordering of the SNodes in the
DAG and to maintain it even after DAG modifications. The topological ordering allows for very fast
node reachability checks.
Tests on very big input files with tens of thousands of instructions in a BB indicate huge
speed-ups (up to 10x compilation time improvement) compared to the old version.
llvm-svn: 48817
With this pass, StrongPHIElim can compile very simple testcases correctly. There's still a ways
to go before it's ready for prime time, though.
llvm-svn: 48719
flags. This is needed by the new legalize types
infrastructure which wants to expand the 64 bit
constants previously used to hold the flags on
32 bit machines. There are two functional changes:
(1) in LowerArguments, if a parameter has the zext
attribute set then that is marked in the flags;
before it was being ignored; (2) PPC had some bogus
code for handling two word arguments when using the
ELF 32 ABI, which was hard to convert because of
the bogusness. As suggested by the original author
(Nicolas Geoffray), I've disabled it for the moment.
Tested with "make check" and the Ada ACATS testsuite.
llvm-svn: 48640
1. If part of a register is re-defined, an implicit kill and an implicit def are added to denote read / mod / write. However, this should only be necessary if the register is actually read later. This is a performance issue.
2. If a sub-register is being defined, and it doesn't have a previous use, do not add a implicit kill to the last use of a super-register:
= EAX, AX<imp-use,kill>
...
AX =
In this case, EAX is live but AX is killed, this is wrong and will cause the coalescer to do bad things.
llvm-svn: 48521
the fcopysign expansion from LegalizeDAG to get rid of
what seems to be a bug: the use of sign extension means
that when copying the sign bit from an f32 to an f64,
the upper 32 bits of the f64 (now an i64) are set, not
just the top bit... I also generalized it to work for
any sized floating point types, and removed the bogosity:
SDOperand Mask1 = (SrcVT == MVT::f64)
? DAG.getConstantFP(BitsToDouble(1ULL << 63), SrcVT)
: DAG.getConstantFP(BitsToFloat(1U << 31), SrcVT);
Mask1 = DAG.getNode(ISD::BIT_CONVERT, SrcNVT, Mask1);
(here SrcNVT is an integer with the same size as SrcVT).
As far as I can see this takes a 1 << 63, converts to
a double, converts that to a floating point constant
then converts that to an integer constant, ending up
with... 1 << 63 as an integer constant! So I just
generate this integer constant directly.
llvm-svn: 48305
getCopyToParts problem was noticed by the new
LegalizeTypes infrastructure. In order to avoid
this kind of thing in the future I've added a
check that EXTRACT_ELEMENT is only used with
integers. Once LegalizeTypes is up and running
most likely BUILD_PAIR and EXTRACT_ELEMENT can
be removed, in favour of using apints instead.
llvm-svn: 48294
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.
llvm-svn: 48279
that merely add passes. This allows them to be used with either
FunctionPassManager or PassManager, or even with a custom new
kind of pass manager.
llvm-svn: 48256
and it's the result that requires expansion. This code is a little confusing
because the TargetLoweringInfo tables for [US]INT_TO_FP use the operand type
(the integer type) rather than the result type.
llvm-svn: 48206
If ALR and BLR overlaps and end of BLR extends beyond end of ALR, e.g.
A = or A, B
...
B = A
...
C = A<kill>
...
= B
then do not add kills of A to the newly created B interval.
- Also fix some kill info update bug.
llvm-svn: 48141
Change insert/extract subreg instructions to be able to be used in TableGen patterns.
Use the above features to reimplement an x86-64 pseudo instruction as a pattern.
llvm-svn: 48130
field to 32 bits, thus enabling correct handling of ByVal
structs bigger than 0x1ffff. Abstract interface a bit.
Fixes gcc.c-torture/execute/pr23135.c and
gcc.c-torture/execute/pr28982b.c in gcc testsuite (were ICE'ing
on ppc32, quietly producing wrong code on x86-32.)
llvm-svn: 48122
they are produced by calls (which are known exact) and by cross block copies
which are known to be produced by extends.
This improves:
define double @test2() {
%tmp85 = call double asm sideeffect "fld0", "={st(0)}"()
ret double %tmp85
}
from:
_test2:
subl $20, %esp
# InlineAsm Start
fld0
# InlineAsm End
fstpl 8(%esp)
movsd 8(%esp), %xmm0
movsd %xmm0, (%esp)
fldl (%esp)
addl $20, %esp
#FP_REG_KILL
ret
to:
_test2:
# InlineAsm Start
fld0
# InlineAsm End
#FP_REG_KILL
ret
by avoiding a f64 <-> f80 trip
llvm-svn: 48108
an RFP register class.
Teach ScheduleDAG how to handle CopyToReg with different src/dst
reg classes.
This allows us to compile trivial inline asms that expect stuff
on the top of x87-fp stack.
llvm-svn: 48107
in different register classes, e.g. copy of ST(0) to RFP*. This gets
some really trivial inline asm working that plops things on the top of
stack (PR879)
llvm-svn: 48105
of BUILD_VECTORS that only have two unique elements:
1. The previous code was nondeterminstic, because it walked a map in
SDOperand order, which isn't determinstic.
2. The previous code didn't handle the case when one element was undef
very well. Now we ensure that the generated shuffle mask has the
undef vector on the RHS (instead of potentially being on the LHS)
and that any elements that refer to it are themselves undef. This
allows us to compile CodeGen/X86/vec_set-9.ll into:
_test3:
movd %rdi, %xmm0
punpcklqdq %xmm0, %xmm0
ret
instead of:
_test3:
movd %rdi, %xmm1
#IMPLICIT_DEF %xmm0
punpcklqdq %xmm1, %xmm0
ret
... saving a register.
llvm-svn: 48060
_test3:
movd %rdi, %xmm1
#IMPLICIT_DEF %xmm0
punpcklqdq %xmm1, %xmm0
ret
instead of:
_test3:
#IMPLICIT_DEF %rax
movd %rax, %xmm0
movd %rdi, %xmm1
punpcklqdq %xmm1, %xmm0
ret
This is still not ideal. There is no reason to two xmm regs.
llvm-svn: 48058
except ppc long double. This allows us to shrink constant pool
entries for x86 long double constants, which in turn allows us to
use flds/fldl instead of fldt.
llvm-svn: 47938
bug in r47928 (Int64Ty is the correct type for the constant
pool entry here) and removes the asserts, now that the code
is capable of handling i128.
llvm-svn: 47932
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive.
llvm-svn: 47931
The basic idea is that all these algorithms are computing the longest paths from the root node or to the exit node. Therefore the existing implementation that uses and iterative and potentially
exponential algorithm was changed to a well-known graph algorithm based on dynamic programming. It has a linear run-time.
llvm-svn: 47884
- Cleaned up how the prologue-epilogue inserter loops over the instructions.
- Instead of restarting the processing of an instruction if we remove an
implicit kill, just update the end iterator and make sure that the iterator
isn't incremented.
llvm-svn: 47870
same size as an int type by doing a bitconvert of
load/store of the int type (same algorithm as floating point).
This makes them work for ppc Altivec. There was some
code that purported to handle loads of (some) vectors
by splitting them into two smaller vectors, but getExtLoad
rejects subvector loads, so this could never have worked;
the patch removes it.
llvm-svn: 47696
approach taken is different to that in LegalizeDAG
when it is a question of expanding or promoting the
result type: for example, if extracting an i64 from
a <2 x i64>, when i64 needs expanding, it bitcasts
the vector to <4 x i32>, extracts the appropriate
two i32's, and uses those for the Lo and Hi parts.
Likewise, when extracting an i16 from a <4 x i16>,
and i16 needs promoting, it bitcasts the vector to
<2 x i32>, extracts the appropriate i32, twiddles
the bits if necessary, and uses that as the promoted
value. This puts more pressure on bitcast legalization,
and I've added the appropriate cases. They needed to
be added anyway since users can generate such bitcasts
too if they want to. Also, when considering various
cases (Legal, Promote, Expand, Scalarize, Split) it is
a pain that expand can correspond to Expand, Scalarize
or Split, so I've changed the LegalizeTypes enum so it
lists those different cases - now Expand only means
splitting a scalar in two.
The code produced is the same as by LegalizeDAG for
all relevant testcases, except for
2007-10-31-extractelement-i64.ll, where the code seems
to have improved (see below; can an expert please tell
me if it is better or not).
Before < vs after >.
< subl $92, %esp
< movaps %xmm0, 64(%esp)
< movaps %xmm0, (%esp)
< movl 4(%esp), %eax
< movl %eax, 28(%esp)
< movl (%esp), %eax
< movl %eax, 24(%esp)
< movq 24(%esp), %mm0
< movq %mm0, 56(%esp)
---
> subl $44, %esp
> movaps %xmm0, 16(%esp)
> pshufd $1, %xmm0, %xmm1
> movd %xmm1, 4(%esp)
> movd %xmm0, (%esp)
> movq (%esp), %mm0
> movq %mm0, 8(%esp)
< subl $92, %esp
< movaps %xmm0, 64(%esp)
< movaps %xmm0, (%esp)
< movl 12(%esp), %eax
< movl %eax, 28(%esp)
< movl 8(%esp), %eax
< movl %eax, 24(%esp)
< movq 24(%esp), %mm0
< movq %mm0, 56(%esp)
---
> subl $44, %esp
> movaps %xmm0, 16(%esp)
> pshufd $3, %xmm0, %xmm1
> movd %xmm1, 4(%esp)
> movhlps %xmm0, %xmm0
> movd %xmm0, (%esp)
> movq (%esp), %mm0
> movq %mm0, 8(%esp)
< subl $92, %esp
< movaps %xmm0, 64(%esp)
---
> subl $44, %esp
< movl 16(%esp), %eax
< movl %eax, 48(%esp)
< movl 20(%esp), %eax
< movl %eax, 52(%esp)
< movaps %xmm0, (%esp)
< movl 4(%esp), %eax
< movl %eax, 60(%esp)
< movl (%esp), %eax
< movl %eax, 56(%esp)
---
> pshufd $1, %xmm0, %xmm1
> movd %xmm1, 4(%esp)
> movd %xmm0, (%esp)
> movd %xmm1, 12(%esp)
> movd %xmm0, 8(%esp)
< subl $92, %esp
< movaps %xmm0, 64(%esp)
---
> subl $44, %esp
< movl 24(%esp), %eax
< movl %eax, 48(%esp)
< movl 28(%esp), %eax
< movl %eax, 52(%esp)
< movaps %xmm0, (%esp)
< movl 12(%esp), %eax
< movl %eax, 60(%esp)
< movl 8(%esp), %eax
< movl %eax, 56(%esp)
---
> pshufd $3, %xmm0, %xmm1
> movd %xmm1, 4(%esp)
> movhlps %xmm0, %xmm0
> movd %xmm0, (%esp)
> movd %xmm1, 12(%esp)
> movd %xmm0, 8(%esp)
llvm-svn: 47672
operand of a VECTOR_SHUFFLE. The mask is a
vector of constant integers. The code in
LegalizeDAG doesn't bother to legalize the
mask, since it's basically just storage for
a bunch of constants, however LegalizeTypes
is more picky. The problem is that there may
not exist any legal vector-of-integers type
with a legal element type, so it is impossible
to create a legal mask! Unless of course you
cheat by creating a BUILD_VECTOR where the
operands have a different type to the element
type of the vector being built... This is
pretty ugly but works - all relevant tests in
the testsuite pass, and produce the same
assembler with and without LegalizeTypes.
llvm-svn: 47670
Change several cases in SimplifyDemandedMask that don't ever do any
simplifying to reuse the logic in ComputeMaskedBits instead of
duplicating it.
llvm-svn: 47648
instead of init'ing it maximally to zeros on entry. getFreePhysReg
is pretty hot and only a few elements are typically used. This speeds
up linscan by 5% on 176.gcc.
llvm-svn: 47631
CodeGen/PowerPC/illegal-element-type.ll): suppose
a node X is processed, and processing maps it to
a node Y. Then X continues to exist in the DAG,
but with no users. While processing some other
node, a new node may be created that happens to
be equal to X, and thus X will be reused rather
than a truly new node. This can cause X to
"magically reappear", and since it is in the
Processed state in will not be reprocessed, so
at the end of type legalization the illegal node
X can still be present. The solution is to replace
X with Y whenever X gets resurrected like this.
llvm-svn: 47601
after legalize. Just because a constant is legal (e.g. 0.0 in SSE)
doesn't mean that its negated value is legal (-0.0). We could make
this stronger by checking to see if the negated constant is actually
legal post negation, but it doesn't seem like a big deal.
llvm-svn: 47591
out of illegal elements (BUILD_VECTOR). Uses and beefs
up BUILD_PAIR, though it didn't really have to. Like
most of LegalizeTypes, does not support soft-float.
This cures all "make check" vector building failures.
llvm-svn: 47537
%r3 on PPC) in their ASM files. However, it's hard for humans to read
during debugging. Adding a new field to the register data that lets you
specify a different name to be printed than the one that goes into the
ASM file -- %x3 instead of %r3, for instance.
llvm-svn: 47534
it checks if ESI is available, it then looks at registers aliases to ESI. SIL is marked -2 (not allocatable) but isPhysRegAvailable() incorrectly assumes it is in use and returns false for ESI.
llvm-svn: 47499
inline asms.
Fix PR2078 by marking aliases of registers used when a register is
marked used. This prevents EAX from being allocated when AX is listed
in the clobber set for the asm.
llvm-svn: 47426
No need to go up more levels. A def of a register also sets its sub-registers
(so if PhysRegInfo[SuperReg] is NULL, it means SuperReg's super registers are
not previously defined).
llvm-svn: 47399
and splitting extract_subvector. This fixes nine
"make check" testcases, for example
2008-02-04-ExtractSubvector.ll and (partially)
CodeGen/Generic/vector.ll.
llvm-svn: 47384
AddNodeIDNode does profiling for a ConstantSDNode, but so does
SelectionDAG::getConstant. This profiling should be moved to a common
static function in ConstantSDNode.
llvm-svn: 47359
- For now, conservatively ignore copy MI whose source is a physical register. Commuting its def MI can cause a physical register live interval to be live through a loop (since we know it's live coming into the def MI).
llvm-svn: 47281
tblgen will complain if a sign-extended constant does not fit into a
data type smaller than i32, e.g., i16. This causes a problem when certain
hex constants are used, such as 0xff for byte masks or immediate xor
values.
tblgen will try the sign-extended value first and, if the sign extended
value would overflow, it tries to see if the unsigned value will fit.
Consequently, a software developer can now safely incant:
(XORHIr16 R16C:$rA, 0xffff)
which is somewhat clearer and more informative than incanting:
(XORHIr16 R16C:$rA, (i16 -1))
even if the two are bitwise equivalent.
Tblgen also outputs the 64-bit unsigned constant in the generated ISel code
when getTargetConstant() is invoked.
llvm-svn: 47188
in a ret node. These are created as i32 constants
but on some platforms i32 is not legal. This
fixes 26 "make check" failures, for example
Alpha/2005-07-12-TwoMallocCalls.ll.
llvm-svn: 47172
the return value is zero-extended if it isn't
sign-extended. It may also be any-extended.
Also, if a floating point value was returned
in a larger floating point type, pass 1 as the
second operand to FP_ROUND, which tells it
that all the precision is in the original type.
I think this is right but I could be wrong.
Finally, when doing libcalls, set isZExt on
a parameter if it is "unsigned". Currently
isSExt is set when signed, and nothing is
set otherwise. This should be right for all
calls to standard library routines.
llvm-svn: 47122
1) ConstantFP is now expand by default
2) ConstantFP is not turned into TargetConstantFP during Legalize
if it is legal.
This allows ConstantFP to be handled like Constant, allowing for
targets that can encode FP immediates as MachineOperands.
As a bonus, fix up Itanium FP constants, which now correctly match,
and match more constants! Hooray.
llvm-svn: 47121
CTTZ and CTPOP. The expansion code differs from
that in LegalizeDAG in that it chooses to take the
CTLZ/CTTZ count from the Hi/Lo part depending on
whether the Hi/Lo value is zero, not on whether
CTLZ/CTTZ of Hi/Lo returned 32 (or whatever the
width of the type is) for it. I made this change
because the optimizers may well know that Hi/Lo
is zero and exploit it. The promotion code for
CTTZ also differs from that in LegalizeDAG: it
uses an "or" to get the right result when the
original value is zero, rather than using a compare
and select. This also means the value doesn't
need to be zero extended.
llvm-svn: 47075
node as soon as we create it in SDISel. Previously we would lower it in
legalize. The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them. This causes us to miss some optimizations because
datatype expansion also happens here.
Exposing the loads early allows us to do optimizations on them. For example
we now compile arg-cast.ll to:
_foo:
movl $2147483647, %eax
andl 8(%esp), %eax
ret
where we previously produced:
_foo:
subl $12, %esp
movsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
movl $2147483647, %eax
andl 4(%esp), %eax
addl $12, %esp
ret
It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.
llvm-svn: 47054
PR1877.
A3 = op A2 B0<kill>
...
B1 = A3 <- this copy
...
= op A3 <- more uses
==>
B2 = op B0 A2<kill>
...
B1 = B2 <- now an identify copy
...
= op B2 <- more uses
This speeds up FreeBench/neural by 29%, Olden/bh by 12%, oopack_v1p8 by 53%.
llvm-svn: 47046
handle arbitrary precision integers and any number
of parts. For example, on a 32 bit machine an i50
corresponds to two i32 parts. getCopyToParts will
extend the i50 to an i64 then write half of the i64
to each part; getCopyFromParts will combine the two
i32 parts into an i64 then truncate the result to
i50.
llvm-svn: 47024
Added member template "Add" to FoldingSetNodeID that allows "adding" arbitrary
objects to a profile via dispatch to FoldingSetTrait<T>::Profile().
Removed FoldingSetNodeID::AddAPFloat and FoldingSetNodeID::APInt, as their
functionality is now replaced using the above mentioned member template.
llvm-svn: 46957
initializer problem, a minor tweak to the way the
DAGISelEmitter finds load/store nodes, and a renaming of the
new PseudoSourceValue objects.
llvm-svn: 46827
ReadyToProcess node - add an assertion to check
this. Add an assertion to NodeDeleted that checks
that processed/ready nodes are indeed not deleted.
It is because they are never deleted that none of
the maps can have a deleted node as the source of
a mapping. It does however seem to be possible in
theory to have a deleted value as the target of a
mapping, however this has not yet been spotted in
the wild. Still mulling on what to do about this.
[The theoretical situation is this: a node A is
expanded/promoted/whatever to a newly created node
B. Thus A->B is added to a map. When the subtree
rooted at B is legalized it is conceivable that B
is deleted due to RAUW on a node somewhere above
it].
llvm-svn: 46705
keep the LegalizeTypes node flags up to date when doing a RAUW.
This fixes a nasty bug that Duncan ran into and makes the
previous (nonbuggy case) more efficent.
llvm-svn: 46679
DAGUpdateListener object pointer instead of just returning a vector
of deleted nodes. This makes the interfaces more efficient (no more
allocating a vector [at least a malloc], filling it in, then walking
it) and more clean. This also allows the client to be notified of
nodes that are *changed* but not deleted.
llvm-svn: 46677
SelectionDAG::ReplaceAllUsesWith to handle replacement of
an SDOperand with *any* sdoperand, not just one for a node with
a single result. Note that this has a horrible FIXME'd hack in it
to work around PR1975. This should be removed when PR1975 is fixed.
llvm-svn: 46674
Added ISD::DECLARE node type to represent llvm.dbg.declare intrinsic. Now the intrinsic calls are lowered into a SDNode and lives on through out the codegen passes.
For now, since all the debugging information recording is done at isel time, when a ISD::DECLARE node is selected, it has the side effect of also recording the variable. This is a short term solution that should be fixed in time.
llvm-svn: 46659
Replace getLocation() with getFrameIndexOffset() which returns the delta from frame pointer to stack slot. Dwarf writer can then use the information for whatever it wants.
llvm-svn: 46597
in the backend. Introduce a new SDNode type, MemOperandSDNode, for
holding a MemOperand in the SelectionDAG IR, and add a MemOperand
list to MachineInstr, and code to manage them. Remove the offset
field from SrcValueSDNode; uses of SrcValueSDNode that were using
it are all all using MemOperandSDNode now.
Also, begin updating some getLoad and getStore calls to use the
PseudoSourceValue objects.
Most of this was written by Florian Brander, some
reorganization and updating to TOT by me.
llvm-svn: 46585
Note this solution might be somewhat fragile since ISD::LABEL may be used for other
purposes. If that ends up to be an issue, we may need to introduce a different node
for debug labels.
llvm-svn: 46571
and StoreSDNode into their common base class LSBaseSDNode. Member
functions getLoadedVT and getStoredVT are replaced with the common
getMemoryVT to simplify code that will handle both loads and stores.
llvm-svn: 46538
type that matters but the operand type. This fixes
2008-01-08-IllegalCMP.ll which crashed with the new
legalize infrastructure because SETCC with result
type i8 and operand type i64 was being custom expanded
by the X86 backend. With this fix, the gcc build gets
as far as the first libcall.
llvm-svn: 46525
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
llvm-svn: 46414
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
llvm-svn: 46384
1. we already know the value is dead, so don't bother replacing
it with undef.
2. The very case the comment describes actually makes the load
live which asserts in deletenode. If we do the replacement
and the node becomes live, just treat it as new. This fixes
a failure on X86/2008-01-16-InvalidDAGCombineXform.ll with
some local changes in my tree.
llvm-svn: 46306
dead stuff around. This gets fed into the isel pass and causes certain foldings from
happening because nodes have extraneous uses floating around. For example, if we turned
foo(bar(x)) -> baz(x), we sometimes left bar(x) around.
llvm-svn: 46305
precision integers. This won't actually work
(and most of the code is dead) unless the new
legalization machinery is turned on. While
there, I rationalized the handling of i1, and
removed some bogus (and unused) sextload patterns.
For i1, this could result in microscopically
better code for some architectures (not X86).
It might also result in worse code if annotating
with AssertZExt nodes turns out to be more harmful
than helpful.
llvm-svn: 46280
NDEBUG. This is in response to a really nasty bug I introduced that
Dale tracked down, hopefully this won't happen in the future.
Many thanks Dale.
llvm-svn: 46254
integers. Handle truncstore of a legal type to an unusual
number of bits. Most of this code is not reachable unless
the new legalize infrastructure is turned on.
llvm-svn: 46249
1. Legalize now always promotes truncstore of i1 to i8.
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
safe.
The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:
_foo:
fldt 20(%esp)
fldt 4(%esp)
faddp %st(1)
movl 36(%esp), %eax
fstps (%eax)
ret
instead of:
_foo:
subl $4, %esp
fldt 24(%esp)
fldt 8(%esp)
faddp %st(1)
fstps (%esp)
movl 40(%esp), %eax
movss (%esp), %xmm0
movss %xmm0, (%eax)
addl $4, %esp
ret
llvm-svn: 46140
and switch various codegen pieces and the X86 backend over
to using it.
* Add some comments to SelectionDAGNodes.h
* Introduce a second argument to FP_ROUND, which indicates
whether the FP_ROUND changes the value of its input. If
not it is safe to xform things like fp_extend(fp_round(x)) -> x.
llvm-svn: 46125
and the spill is its kill. However, if the local allocator has determined the
register has not been modified (possible when its value was reloaded), it would
not issue a restore. In that case, mark the last use of the virtual register as
kill.
llvm-svn: 46111
It's not safe to use the two value CombineTo variant to combine away a dead load.
e.g.
v1, chain2 = load chain1, loc
v2, chain3 = load chain2, loc
v3 = add v2, c
Now we replace use of v1 with undef, use of chain2 with chain1.
ReplaceAllUsesWith() will iterate through uses of the first load and update operands:
v1, chain2 = load chain1, loc
v2, chain3 = load chain1, loc
v3 = add v2, c
Now the second load is the same as the first load, SelectionDAG cse will ensure
the use of second load is replaced with the first load.
v1, chain2 = load chain1, loc
v3 = add v1, c
Then v1 is replaced with undef and bad things happen.
llvm-svn: 46099
into the ANY_EXTEND/ZERO_EXTEND/SIGN_EXTEND code to simplify it.
Unmerge the code for FP_ROUND and FP_EXTEND from each other to
make each one simpler.
llvm-svn: 46061
ShortenEHDataFor64Bits as a not-very-accurate
abstraction to cover all the changes in DwarfWriter.
Some cosmetic changes to Darwin assembly code for
gcc testsuite compatibility.
llvm-svn: 46029
has no stores between the load and the end of block. This works
great and sinks hundreds of stores, but we can't turn it on because
machineinstrs don't have volatility information and we don't want to
sink volatile stores :(
llvm-svn: 45894
both work right according to the new flags.
This removes the TII::isReallySideEffectFree predicate, and adds
TII::isInvariantLoad.
It removes NeverHasSideEffects+MayHaveSideEffects and adds
UnmodeledSideEffects as machine instr flags. Now the clients
can decide everything they need.
I think isRematerializable can be implemented in terms of the
flags we have now, though I will let others tackle that.
llvm-svn: 45843
Likewise fix up a bunch of other libcalls. While
there I remove NEG_F32 and NEG_F64 since they are
not used anywhere. This fixes 9 Ada ACATS failures.
llvm-svn: 45833
all clients over to using predicates instead of these flags directly.
These are now private values which are only to be used to statically
initialize the tables.
llvm-svn: 45692
flags that can be set. Add predicates for the ones lacking it, and switch
some clients over to using the predicates instead of Flags directly.
llvm-svn: 45690
that it is cheap and efficient to get.
Move a variety of predicates from TargetInstrInfo into
TargetInstrDescriptor, which makes it much easier to query a predicate
when you don't have TII around. Now you can use MI->getDesc()->isBranch()
instead of going through TII, and this is much more efficient anyway. Not
all of the predicates have been moved over yet.
Update old code that used MI->getInstrDescriptor()->Flags to use the
new predicates in many places.
llvm-svn: 45674
up to the various compiler pipelines.
This doesn't actually add support for any GC algorithms, which means it
temporarily breaks a few tests. To be fixed shortly.
llvm-svn: 45669
it now returns the machineinstr of the use. To get the operand, use I.getOperand().
Add a new MachineRegisterInfo::replaceRegWith, which is basically like
Value::replaceAllUsesWith.
llvm-svn: 45482
a header file from libcodegen. This violates a layering order: codegen
depends on target, not the other way around. The fix to this is to
split TII into two classes, TII and TargetInstrInfoImpl, which defines
stuff that depends on libcodegen. It is defined in libcodegen, where
the base is not.
llvm-svn: 45475
values, which means doing extra legalization work.
It would be easier to get this kind of thing right if
there was some documentation...
llvm-svn: 45472
that "machine" classes are used to represent the current state of
the code being compiled. Given this expanded name, we can start
moving other stuff into it. For now, move the UsedPhysRegs and
LiveIn/LoveOuts vectors from MachineFunction into it.
Update all the clients to match.
This also reduces some needless #includes, such as MachineModuleInfo
from MachineFunction.
llvm-svn: 45467
e.g. MO.isMBB() instead of MO.isMachineBasicBlock(). I don't plan on
switching everything over, so new clients should just start using the
shorter names.
Remove old long accessors, switching everything over to use the short
accessor: getMachineBasicBlock() -> getMBB(),
getConstantPoolIndex() -> getIndex(), setMachineBasicBlock -> setMBB(), etc.
llvm-svn: 45464
- Eliminate the static "print" method for operands, moving it
into MachineOperand::print.
- Change various set* methods for register flags to take a bool
for the value to set it to. Remove unset* methods.
- Group methods more logically by operand flavor in MachineOperand.h
llvm-svn: 45461
- Add getParent() accessors.
- Move SubReg out of the AuxInfo union, to make way for future changes.
- Remove the getImmedValue/setImmedValue methods.
- in some MachineOperand::Create* methods, stop initializing fields that are dead.
MachineInstr:
- Delete one copy of the MachineInstr printing code, now there is only one dump
format and one copy of the code.
- Make MachineOperand use the parent field to get info about preg register names if
no target info is otherwise available.
- Move def/use/kill/dead flag printing to the machineoperand printer, so they are
always printed for an operand.
llvm-svn: 45460
to know about calls that cannot throw ('nounwind'):
if such a call does throw for some reason then the
personality will terminate the program. The distinction
between an ordinary call and a nounwind call is that
an ordinary call gets an entry in the exception table
but a nounwind call does not. This patch sets up the
exception table appropriately. One oddity is that
I've chosen to bracket nounwind calls with labels (like
invokes) - the other choice would have been to bracket
ordinary calls with labels. While bracketing
ordinary calls is more natural (because bracketing
by labels would then correspond exactly to getting an
entry in the exception table), I didn't do it because
introducing labels impedes some optimizations and I'm
guessing that ordinary calls occur more often than
nounwind calls. This fixes the gcc filter2 eh test,
at least at -O0 (the inliner needs some tweaking at
higher optimization levels).
llvm-svn: 45197
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.
llvm-svn: 44954
_foo:
movl $12, %eax
andl 4(%esp), %eax
movl _array(%eax), %eax
ret
instead of:
_foo:
movl 4(%esp), %eax
shrl $2, %eax
andl $3, %eax
movl _array(,%eax,4), %eax
ret
As it turns out, this triggers all the time, in a wide variety of
situations, for example, I see diffs like this in various programs:
- movl 8(%eax), %eax
- shll $2, %eax
- andl $1020, %eax
- movl (%esi,%eax), %eax
+ movzbl 8(%eax), %eax
+ movl (%esi,%eax,4), %eax
- shll $2, %edx
- andl $1020, %edx
- movl (%edi,%edx), %edx
+ andl $255, %edx
+ movl (%edi,%edx,4), %edx
Unfortunately, I also see stuff like this, which can be fixed in the
X86 backend:
- andl $85, %ebx
- addl _bit_count(,%ebx,4), %ebp
+ shll $2, %ebx
+ andl $340, %ebx
+ addl _bit_count(%ebx), %ebp
llvm-svn: 44656
This allows an important optimization to be re-enabled.
- If all uses / defs of a split interval can be folded, give the interval a
low spill weight so it would not be picked in case spilling is needed (avoid
pushing other intervals in the same BB to be spilled).
llvm-svn: 44601
throw exceptions", just mark intrinsics with the nounwind
attribute. Likewise, mark intrinsics as readnone/readonly
and get rid of special aliasing logic (which didn't use
anything more than this anyway).
llvm-svn: 44544
in the middle of a split basic block, create a new live interval starting at
the def. This avoid artifically extending the live interval over a number of
cycles where it is dead. e.g.
bb1:
= vr1204 (use / kill) <= new interval starts and ends here.
...
...
vr1204 = (new def) <= start a new interval here.
= vr1204 (use)
llvm-svn: 44436
the function type, instead they belong to functions
and function calls. This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll). Hopefully
a bitcode guru (who might that be? :) ) will fix it.
llvm-svn: 44359
optimized. This avoids creating illegal divisions when the combiner is
running after legalize; this fixes PR1815. Also, it produces better
code in the included testcase by avoiding the subtract and multiply
when the division isn't optimized.
llvm-svn: 44341
1) Change the interface to TargetLowering::ExpandOperationResult to
take and return entire NODES that need a result expanded, not just
the value. This allows us to handle things like READCYCLECOUNTER,
which returns two values.
2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES.
3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new
ExpandOperationResult. This makes the result simpler and fully
general.
4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes.
5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM
i64 shifts, allowing them to work with LegalizeDAGTypes.
6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT,
allowing them to work with LegalizeDAGTypes.
LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when
type legalization in LegalizeDAG is ifdef'd out.
llvm-svn: 44300
node A gets back into the DAG again because it was hiding in
one of the node maps: make sure that node replacement happens
in those maps too.
llvm-svn: 44263
Fix a couple of problems:
1. Don't assume the VT-1 is a VT that is half the size.
2. Treat vectors of FP in the vector path, not the FP path.
This has a couple of remaining problems before it will work with
the code in PR1811: the code below this change assumes that it can
use extload/shift/or to construct the result, which isn't right for
vectors.
This also doesn't handle vectors of 1 or vectors that aren't pow-2.
llvm-svn: 44243
When a live interval is being spilled, rather than creating short, non-spillable
intervals for every def / use, split the interval at BB boundaries. That is, for
every BB where the live interval is defined or used, create a new interval that
covers all the defs and uses in the BB.
This is designed to eliminate one common problem: multiple reloads of the same
value in a single basic block. Note, it does *not* decrease the number of spills
since no copies are inserted so the split intervals are *connected* through
spill and reloads (or rematerialization). The newly created intervals can be
spilled again, in that case, since it does not span multiple basic blocks, it's
spilled in the usual manner. However, it can reuse the same stack slot as the
previously split interval.
This is currently controlled by -split-intervals-at-bb.
llvm-svn: 44198
MachineOperand auxInfo. Previous clunky implementation uses an external map
to track sub-register uses. That works because register allocator uses
a new virtual register for each spilled use. With interval splitting (coming
soon), we may have multiple uses of the same register some of which are
of using different sub-registers from others. It's too fragile to constantly
update the information.
llvm-svn: 44104
adjustment fields, and an optional flag. If there is a "dynamic_stackalloc" in
the code, make sure that it's bracketed by CALLSEQ_START and CALLSEQ_END. If
not, then there is the potential for the stack to be changed while the stack's
being used by another instruction (like a call).
This can only result in tears...
llvm-svn: 44037
should only effect x86 when using long double. Now
12/16 bytes are output for long double globals (the
exact amount depends on the alignment). This brings
globals in line with the rest of LLVM: the space
reserved for an object is now always the ABI size.
One tricky point is that only 10 bytes should be
output for long double if it is a field in a packed
struct, which is the reason for the additional
argument to EmitGlobalConstant.
llvm-svn: 43688
can be eliminated by the allocator is the destination and source targets the
same register. The most common case is when the source and destination registers
are in different class. For example, on x86 mov32to32_ targets GR32_ which
contains a subset of the registers in GR32.
The allocator can do 2 things:
1. Set the preferred allocation for the destination of a copy to that of its source.
2. After allocation is done, change the allocation of a copy destination (if
legal) so the copy can be eliminated.
This eliminates 443 extra moves from 403.gcc.
llvm-svn: 43662
The meaning of getTypeSize was not clear - clarifying it is important
now that we have x86 long double and arbitrary precision integers.
The issue with long double is that it requires 80 bits, and this is
not a multiple of its alignment. This gives a primitive type for
which getTypeSize differed from getABITypeSize. For arbitrary precision
integers it is even worse: there is the minimum number of bits needed to
hold the type (eg: 36 for an i36), the maximum number of bits that will
be overwriten when storing the type (40 bits for i36) and the ABI size
(i.e. the storage size rounded up to a multiple of the alignment; 64 bits
for i36).
This patch removes getTypeSize (not really - it is still there but
deprecated to allow for a gradual transition). Instead there is:
(1) getTypeSizeInBits - a number of bits that suffices to hold all
values of the type. For a primitive type, this is the minimum number
of bits. For an i36 this is 36 bits. For x86 long double it is 80.
This corresponds to gcc's TYPE_PRECISION.
(2) getTypeStoreSizeInBits - the maximum number of bits that is
written when storing the type (or read when reading it). For an
i36 this is 40 bits, for an x86 long double it is 80 bits. This
is the size alias analysis is interested in (getTypeStoreSize
returns the number of bytes). There doesn't seem to be anything
corresponding to this in gcc.
(3) getABITypeSizeInBits - this is getTypeStoreSizeInBits rounded
up to a multiple of the alignment. For an i36 this is 64, for an
x86 long double this is 96 or 128 depending on the OS. This is the
spacing between consecutive elements when you form an array out of
this type (getABITypeSize returns the number of bytes). This is
TYPE_SIZE in gcc.
Since successive elements in a SequentialType (arrays, pointers
and vectors) need to be aligned, the spacing between them will be
given by getABITypeSize. This means that the size of an array
is the length times the getABITypeSize. It also means that GEP
computations need to use getABITypeSize when computing offsets.
Furthermore, if an alloca allocates several elements at once then
these too need to be aligned, so the size of the alloca has to be
the number of elements multiplied by getABITypeSize. Logically
speaking this doesn't have to be the case when allocating just
one element, but it is simpler to also use getABITypeSize in this
case. So alloca's and mallocs should use getABITypeSize. Finally,
since gcc's only notion of size is that given by getABITypeSize, if
you want to output assembler etc the same as gcc then getABITypeSize
is the size you want.
Since a store will overwrite no more than getTypeStoreSize bytes,
and a read will read no more than that many bytes, this is the
notion of size appropriate for alias analysis calculations.
In this patch I have corrected all type size uses except some of
those in ScalarReplAggregates, lib/Codegen, lib/Target (the hard
cases). I will get around to auditing these too at some point,
but I could do with some help.
Finally, I made one change which I think wise but others might
consider pointless and suboptimal: in an unpacked struct the
amount of space allocated for a field is now given by the ABI
size rather than getTypeStoreSize. I did this because every
other place that reserves memory for a type (eg: alloca) now
uses getABITypeSize, and I didn't want to make an exception
for unpacked structs, i.e. I did it to make things more uniform.
This only effects structs containing long doubles and arbitrary
precision integers. If someone wants to pack these types more
tightly they can always use a packed struct.
llvm-svn: 43620
storing an i170 on a 32 bit machine. This is first
promoted to a trunc-i170 store of an i256. On a
little-endian machine this expands to a store of
an i128 and a trunc-i42 store of an i128. The
trunc-i42 store is further expanded to a trunc-i42
store of an i64, then to a store of an i32 and a
trunc-i10 store of an i32. At this point the operand
type is legal (i32) and expansion stops (legalization
of the trunc-i10 needs to be handled in LegalizeDAG.cpp).
On big-endian machines the high bits are stored first,
and some bit-fiddling is needed in order to generate
aligned stores.
llvm-svn: 43499
offload to getStore rather than trying to handle
both cases at once (the assertions for example
assume the store really is truncating).
llvm-svn: 43498
transformation. Previously, it's restricted by ensuring the number of load uses
is one. Now the restriction is loosened up by allowing setcc uses to be
"extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq).
llvm-svn: 43465
of offset and the alignment of ptr if these are both powers of
2. While the ptr alignment is guaranteed to be a power of 2,
there is no reason to think that offset is. For example, if
offset is 12 (the size of a long double on x86-32 linux) and
the alignment of ptr is 8, then the alignment of ptr+offset
will in general be 4, not 8. Introduce a function MinAlign,
lifted from gcc, for computing the minimum guaranteed alignment.
I've tried to fix up everywhere under lib/CodeGen/SelectionDAG/.
I also changed some places that weren't wrong (because both values
were a power of 2), as a defensive change against people copying
and pasting the code.
Hopefully someone who cares about alignment will review the rest
of LLVM and fix up the remaining places. Since I'm on x86 I'm
not very motivated to do this myself...
llvm-svn: 43421
FE.
- Explicitly pass in the alignment of the load & store.
- XFAIL 2007-10-23-UnalignedMemcpy.ll because llc has a bug that crashes on
unaligned pointers.
llvm-svn: 43398
have their own custom memcpy lowering code. This code needs to be factored out
into a target-independent lowering method with hooks to the backend. In the
meantime, just call memcpy if we're trying to copy onto a stack.
llvm-svn: 43262
operations so they work right for integers with funky
bit-widths. For example, consider extending i48 to i64
on a 32 bit machine. The i64 result is expanded to 2 x i32.
We know that the i48 operand will be promoted to i64, then
also expanded to 2 x i32. If we had the expanded promoted
operand to hand, then expanding the result would be trivial.
Unfortunately at this stage we can only get hold of the
promoted operand. So instead we kind of hand-expand, doing
explicit shifting and truncating to get the top and bottom
halves of the i64 operand into 2 x i32, which are then used
to expand the result. This is harmless, because when the
promoted operand is finally expanded all this bit fiddling
turns into trivial operations which are eliminated either
by the expansion code itself or the DAG combiner.
llvm-svn: 43223
Turn a store folding instruction into a load folding instruction. e.g.
xorl %edi, %eax
movl %eax, -32(%ebp)
movl -36(%ebp), %eax
orl %eax, -32(%ebp)
=>
xorl %edi, %eax
orl -36(%ebp), %eax
mov %eax, -32(%ebp)
This enables the unfolding optimization for a subsequent instruction which will
also eliminate the newly introduced store instruction.
llvm-svn: 43192
asserts in later checks rather than producing
the ordinary load it is supposed to. Avoid all
such hassles by directly returning an ordinary
load in this case.
llvm-svn: 43174
To do this it is necessary to add a "always inline" argument to the
memcpy node. For completeness I have also added this node to memmove
and memset. I have also added getMem* functions, because the extra
argument makes it cumbersome to use getNode and because I get confused
by it :-)
llvm-svn: 43172
types. This is needed for SIGN_EXTEND_INREG at least.
It is not clear if this is correct for other operations.
On the other hand, for the various load/store actions
it seems to correct to return the type action, as is
currently done.
Also, it seems that SelectionDAG::getValueType can be
called for extended value types; introduce a map for
holding these, since we don't really want to extend
the vector to be 2^32 pointers long!
Generalize DAGTypeLegalizer::PromoteResult_TRUNCATE
and DAGTypeLegalizer::PromoteResult_INT_EXTEND to handle
the various funky possibilities that apints introduce,
for example that you can promote to a type that needs
to be expanded.
llvm-svn: 43071
codegen support. This should have no effect on codegen
for other types. Debatable bits: (1) the use (abuse?)
of a set in SDNode::getValueTypeList; (2) the length of
getTypeToTransformTo, which maybe should be refactored
with a non-inline part for extended value types.
llvm-svn: 43030
getTypeToExpandTo. The difference is that
getTypeToExpandTo gives the final result of expansion
(eg: i128 -> i32 on a 32 bit machine) while
getTypeToTransformTo does just one step (i128 -> i64).
llvm-svn: 42982
take a deleted nodes vector, instead of requiring it.
One more significant change: Implement the start of a legalizer that
just works on types. This legalizer is designed to run before the
operation legalizer and ensure just that the input dag is transformed
into an output dag whose operand and result types are all legal, even
if the operations on those types are not.
This design/impl has the following advantages:
1. When finished, this will *significantly* reduce the amount of code in
LegalizeDAG.cpp. It will remove all the code related to promotion and
expansion as well as splitting and scalarizing vectors.
2. The new code is very simple, idiomatic, and modular: unlike
LegalizeDAG.cpp, it has no 3000 line long functions. :)
3. The implementation is completely iterative instead of recursive, good
for hacking on large dags without blowing out your stack.
4. The implementation updates nodes in place when possible instead of
deallocating and reallocating the entire graph that points to some
mutated node.
5. The code nicely separates out handling of operations with invalid
results from operations with invalid operands, making some cases
simpler and easier to understand.
6. The new -debug-only=legalize-types option is very very handy :),
allowing you to easily understand what legalize types is doing.
This is not yet done. Until the ifdef added to SelectionDAGISel.cpp is
enabled, this does nothing. However, this code is sufficient to legalize
all of the code in 186.crafty, olden and freebench on an x86 machine. The
biggest issues are:
1. Vectors aren't implemented at all yet
2. SoftFP is a mess, I need to talk to Evan about it.
3. No lowering to libcalls is implemented yet.
4. Various operations are missing etc.
5. There are FIXME's for stuff I hax0r'd out, like softfp.
Hey, at least it is a step in the right direction :). If you'd like to help,
just enable the #ifdef in SelectionDAGISel.cpp and compile code with it. If
this explodes it will tell you what needs to be implemented. Help is
certainly appreciated.
Once this goes in, we can do three things:
1. Add a new pass of dag combine between the "type legalizer" and "operation
legalizer" passes. This will let us catch some long-standing isel issues
that we miss because operation legalization often obfuscates the dag with
target-specific nodes.
2. We can rip out all of the type legalization code from LegalizeDAG.cpp,
making it much smaller and simpler. When that happens we can then
reimplement the core functionality left in it in a much more efficient and
non-recursive way.
3. Once the whole legalizer is non-recursive, we can implement whole-function
selectiondags maybe...
llvm-svn: 42981
Make two changes:
1) only xform "store of f32" if i32 is a legal type for the target.
2) only xform "store of f64" if either i64 or i32 are legal for the target.
3) if i64 isn't legal, manually lower to 2 stores of i32 instead of letting a
later pass of legalize do it. This is ugly, but helps future changes I'm
about to commit.
llvm-svn: 42980
the source register will be coalesced to the super register of the LHS. Properly
merge in the live ranges of the resulting coalesced interval that were part of
the original source interval to the live interval of the super-register.
llvm-svn: 42961
Turn this:
movswl %ax, %eax
movl %eax, -36(%ebp)
xorl %edi, -36(%ebp)
into
movswl %ax, %eax
xorl %edi, %eax
movl %eax, -36(%ebp)
by unfolding the load / store xorl into an xorl and a store when we know the
value in the spill slot is available in a register. This doesn't change the
number of instructions but reduce the number of times memory is accessed.
Also unfold some load folding instructions and reuse the value when similar
situation presents itself.
llvm-svn: 42947
(almost) a register copy. However, it always coalesced to the register of the
RHS (the super-register). All uses of the result of a EXTRACT_SUBREG are sub-
register uses which adds subtle complications to load folding, spiller rewrite,
etc.
llvm-svn: 42899
Factor out the code that expands the "nasty scalar code" for unrolling
vectors into a separate routine, teach it how to handle mixed
vector/scalar operands, as seen in powi, and use it for several operators,
including sin, cos, powi, and pow.
Add support in SplitVectorOp for fpow, fpowi and for several unary
operators.
llvm-svn: 42884
enabled by passing -tailcallopt to llc. The optimization is
performed if the following conditions are satisfied:
* caller/callee are fastcc
* elf/pic is disabled OR
elf/pic enabled + callee is in module + callee has
visibility protected or hidden
llvm-svn: 42870
No compile-time support for constant operations yet,
just format transformations. Make readers and
writers work. Split constants into 2 doubles in
Legalize.
llvm-svn: 42865
use ISD::{S,U}DIVREM and ISD::{S,U}MUL_HIO. Move the lowering code
associated with these operators into target-independent in LegalizeDAG.cpp
and TargetLowering.cpp.
llvm-svn: 42762
Check if one of the two results unneeded so see if a simpler operator
could bs used. Also check to see if each of the two computations could be
simplified if they were split into separate operators. Factor out the code
that calls visit() so that it can be used for this purpose.
llvm-svn: 42759
input. APInt unfortunately zero-extends signed integers, so Dale
modified the function to expect zero-extended input. Make this
assumption explicit in the function name.
llvm-svn: 42732
basic arithmetic works.
Rename RTLIB long double functions to distinguish
different flavors of long double; the lib functions
have different names, alas.
llvm-svn: 42644
scheduler will try a number of tricks in order to avoid generating the
copies. This may not be possible in case the node produces a chain value
that prevent movement. Try unfolding the load from the node before to allow
it to be moved / cloned.
llvm-svn: 42625
This version enhances the previous patch to add root initialization
as discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070910/053455.html
Collector gives its subclasses control over generic algorithms:
unsigned NeededSafePoints; //< Bitmask of required safe points.
bool CustomReadBarriers; //< Default is to insert loads.
bool CustomWriteBarriers; //< Default is to insert stores.
bool CustomRoots; //< Default is to pass through to backend.
bool InitRoots; //< If set, roots are nulled during lowering.
It also has callbacks which collectors can hook:
/// If any of the actions are set to Custom, this is expected to
/// be overriden to create a transform to lower those actions to
/// LLVM IR.
virtual Pass *createCustomLoweringPass() const;
/// beginAssembly/finishAssembly - Emit module metadata as
/// assembly code.
virtual void beginAssembly(Module &M, std::ostream &OS,
AsmPrinter &AP,
const TargetAsmInfo &TAI) const;
virtual void finishAssembly(Module &M,
CollectorModuleMetadata &CMM,
std::ostream &OS, AsmPrinter &AP,
const TargetAsmInfo &TAI) const;
Various other independent algorithms could be implemented, but were
not necessary for the initial two collectors. Some examples are
listed here:
http://llvm.org/docs/GarbageCollection.html#collector-algos
llvm-svn: 42466
other than PPC64. Instead of fixing it, just remove it and fix all the
places that use it to use TargetData::getPointerSize() instead, as there
aren't very many. Most of the references were in DwarfWriter.cpp.
llvm-svn: 42419
It includes:
- location and of each safe point in machine code (identified by a
label)
- location of each root within the stack frame (identified by an
offset), including the metadata tag provided to llvm.gcroot in
the user program
- size of the stack frame (for collectors which want to cheat on
stack crawling :)
- and eventually will include liveness
It is to be populated by back-ends during code-generation.
CollectorModuleMetadata aggregates this information across the
entire module.
llvm-svn: 42418