ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
FP_TO_XINT. Necessary for some cleanups I'm working on. Updated
from the previous version (r72431) to fix a bug and make some things a
bit clearer.
llvm-svn: 72445
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or
FP_TO_SINT. This is in preparation for removing the type legalization
code from LegalizeDAG: once type legalization is gone from LegalizeDAG,
it won't be able to handle the i64 operand/result correctly.
This isn't quite ideal, but I don't think any other operation for any
target ends up in this situation, so treating this case specially seems
reasonable.
llvm-svn: 72324
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock. Thanks for Anton for suggesting this.
llvm-svn: 69615
leaq foo@TLSGD(%rip), %rdi
as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.
llvm-svn: 69350
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
llvm-svn: 68560
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
llvm-svn: 68552
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
%a = ...
%b = and i32 %a, 2
%c = srl i32 %b, 1
%d = br i32 %c,
into
%a = ...
%b = and %a, 2
%c = X86ISD::CMP %b, 0
%d = X86ISD::BRCOND %c ...
This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.
llvm-svn: 67728
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
llvm-svn: 66869
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
llvm-svn: 66318
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
llvm-svn: 65296
(Note: Eventually, commits like this will be handled via a pre-commit hook that
does this automagically, as well as expand tabs to spaces and look for 80-col
violations.)
llvm-svn: 64827
in inline asm as signed (what gcc does). Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.
llvm-svn: 64400
Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base. There's no
sensible way to associate debug info with these. I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands.
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.
llvm-svn: 63992
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
llvm-svn: 63494
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
llvm-svn: 63266
tidy up SDUse and related code.
- Replace the operator= member functions with a set method, like
LLVM Use has, and variants setInitial and setNode, which take
care up updating use lists, like LLVM Use's does. This simplifies
code that calls these functions.
- getSDValue() is renamed to get(), as in LLVM Use, though most
places can either use the implicit conversion to SDValue or the
convenience functions instead.
- Fix some more node vs. value terminology issues.
Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.
llvm-svn: 62995
X86. This code:
void f() {
uint32_t x;
float y = (float)x;
}
used to be:
movl %eax, -8(%ebp)
movl [2^52 double], -4(%ebp)
movsd -8(%ebp), %xmm0
subsd [2^52 double], %xmm0
cvtsd2ss %xmm0, %xmm0
Is now:
movsd [2^52 double], %xmm0
movsd %xmm0, %xmm1
movd %ecx, %xmm2
orps %xmm2, %xmm1
subsd %xmm0, %xmm1
cvtsd2ss %xmm1, %xmm0
This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
will be fixed in a later coalescer fix.
llvm-svn: 62404
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.
This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.
llvm-svn: 61389
which are identical to the original patterns.
- Change the multiply with overflow so that we distinguish between signed and
unsigned multiplication. Currently, unsigned multiplication with overflow
isn't working!
llvm-svn: 60963
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.
Similar for SUB and MUL instructions.
llvm-svn: 60915
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
llvm-svn: 60608
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
depending on the type of ADDO requested.
- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
COND_C set.
llvm-svn: 60388
ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.
llvm-svn: 60348
the conditional for the BRCOND statement. For instance, it will generate:
addl %eax, %ecx
jo LOF
instead of
addl %eax, %ecx
; About 10 instructions to compare the signs of LHS, RHS, and sum.
jl LOF
llvm-svn: 60123
a memset using 16-byte XMM stores, but where the stack realignment code
didn't work. Until it does (PR2962) disable use of xmm regs in memcpy
and memset formation for linux and other targets with insufficiently
aligned stacks.
This is part of PR2888
llvm-svn: 58317
LHS is a foldable load, then LHS and RHS are swapped
and SetCCOpcode is changed to SETUGT. But the later
code is expecting operands to be the wrong way round
for SETUGT, but they are not in this case, resulting
in an inverted compare. The solution is to move the
load normalization before the correction for SETUGT.
This bug was tickled by LegalizeTypes which happened
to legalize the testcase slightly differently to
LegalizeDAG.
llvm-svn: 58092
assume that i64 has been turned into a BUILD_PAIR
node (when called from LegalizeTypes this hasn't
happened yet) and don't use a vector shuffle mask
with an illegal element type.
llvm-svn: 57972
The same one Apple gcc uses, faster. Also gets the
extreme case in gcc.c-torture/execute/ieee/rbug.c
correct which we weren't before; this is not
sufficient to get the test to pass though, there
is another bug.
llvm-svn: 57926
in the 32-bit signed offset field of addresses. Even though this
may be intended, some linkers refuse to relocate code where the
relocated address computation overflows.
Also, fix the sign-extension of constant offsets to use the
actual pointer size, rather than the size of the GlobalAddress
node, which may be different, for example on x86-64 where MVT::i32
is used when the address is being fit into the 32-bit displacement
field.
llvm-svn: 57885
Where previously LLVM might emit code like this:
ucomisd %xmm1, %xmm0
setne %al
setp %cl
orb %al, %cl
jne .LBB4_2
it now emits this:
ucomisd %xmm1, %xmm0
jne .LBB4_2
jp .LBB4_2
It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.
To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.
Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.
llvm-svn: 57873
LowerOperation if it doesn't know what else to do.
This methods should probably be factorized some,
but this is good enough for the moment. Have
LowerATOMIC_BINARY_64 use EXTRACT_ELEMENT rather
than assuming the operand is a BUILD_PAIR (if it
is then getNode will automagically simplify the
EXTRACT_ELEMENT). This way LowerATOMIC_BINARY_64
usable from LegalizeTypes.
llvm-svn: 57831
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)
This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.
This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.
Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.
The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.
llvm-svn: 57748
in 32-bit mode instead of assigning a register pair. This has nothing to
do with PR2356, but I happened to notice it while working on it.
llvm-svn: 57704
i.e. conditions that cannot be checked with a single instruction. For example,
SETONE and SETUEQ on x86.
- Teach legalizer to implement *illegal* setcc as a and / or of a number of
legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is
implemented as SETO & SETNE, SETUEQ is SETUO | SETEQ.
- Move x86 target over.
llvm-svn: 57542
- Move the EH landing-pad code and adjust it so that it works
with FastISel as well as with SDISel.
- Add FastISel support for @llvm.eh.exception and
@llvm.eh.selector.
llvm-svn: 57539
`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.
llvm-svn: 56885
its size). Adjust various lowering functions to
pass this info through from CallInst. Use it to
implement sseregparm returns on X86. Remove
X86_ssecall calling convention.
llvm-svn: 56677
s/ParamAttr/Attribute/g
s/PAList/AttrList/g
s/FnAttributeWithIndex/AttributeWithIndex/g
s/FnAttr/Attribute/g
This sets the stage
- to implement function notes as function attributes and
- to distinguish between function attributes and return value attributes.
This requires corresponding changes in llvm-gcc and clang.
llvm-svn: 56622
- Add linkage to SymbolSDNode (default to external).
- Change ISD::ExternalSymbol to ISD::Symbol.
- Change ISD::TargetExternalSymbol to ISD::TargetSymbol
These changes pave the way to allowing SymbolSDNodes with non-external linkage.
llvm-svn: 56249
isImmediate(), isRegister(), and friends, to avoid confusion
about having two different names with the same meaning. I'm
not attached to the longer names, and would be ok with
changing to the shorter names if others prefer it.
llvm-svn: 56189
Currently it just holds the calling convention and flags
for isVarArgs and isTailCall.
And it has several utility methods, which eliminate magic
5+2*i and similar index computations in several places.
CallSDNodes are not CSE'd. Teach UpdateNodeOperands to handle
nodes that are not CSE'd gracefully.
llvm-svn: 56183
cmp-and-swap reversed the Cmp and Swap arguments; comments
make it clear this is unintentional. Unfortunately, the
x86 BE had a compensating reversal, which is removed here.
PPC is OK.
From inspection of the Alpha code I think it is OK, but
if somebody has that platform please check it out. I
cannot test on that platform.
llvm-svn: 56091
HandlePHINodesInSuccessorBlocks that works FastISel-style. This
allows PHI nodes to be updated correctly while using FastISel.
This also involves some code reorganization; ValueMap and
MBBMap are now members of the FastISel class, so they needn't
be passed around explicitly anymore. Also, SelectInstructions
is changed to SelectInstruction, and only does one instruction
at a time.
llvm-svn: 55746
ATOMIC_LOAD_ADD_{8,16,32,64} instead of ATOMIC_LOAD_ADD.
Increased the Hardcoded Constant OpActionsCapacity to match.
Large but boring; no functional change.
This is to support partial-word atomics on ppc; i8 is
not a valid type there, so by the time we get to lowering, the
ATOMIC_LOAD nodes looks the same whether the type was i8 or i32.
The information can be added to the AtomicSDNode, but that is the
largest SDNode; I don't fully understand the SDNode allocation,
but it is sensitive to the largest node size, so increasing
that must be bad. This is the alternative.
llvm-svn: 55457
assign it to a version of the xmm register with the regclass that matches its
type. This fixes PR2715, a bug handling some crazy xpcom case in mozilla.
llvm-svn: 55358
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.
llvm-svn: 55139
class hold a MachineRegisterInfo member, and make the
MachineBasicBlock be passed in to SelectInstructions rather
than the FastISel constructor.
llvm-svn: 55076
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor). Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check". My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!
llvm-svn: 53850
MachineMemOperands. The pools are owned by MachineFunctions.
This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.
llvm-svn: 53212
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes. No change if the
new LegalizeTypes infrastructure is not turned on.
llvm-svn: 53137
to be passed the list of value types, and use this
where appropriate. Inappropriate places are where
the value type list is already known and may be
long, in which case the existing method is more
efficient.
llvm-svn: 53035
the need for a flavor operand, and add a new SDNode subclass,
LabelSDNode, for use with them to eliminate the need for a label id
operand.
Change instruction selection to let these label nodes through
unmodified instead of creating copies of them. Teach the MachineInstr
emitter how to emit a MachineInstr directly from an ISD label node.
This avoids the need for allocating SDNodes for the label id and
flavor value, as well as SDNodes for each of the post-isel label,
label id, and label flavor.
llvm-svn: 52943
purpose, and give it a custom SDNode subclass so that it doesn't
need to have line number, column number, filename string, and
directory string, all existing as individual SDNodes to be the
operands.
This was the only user of ISD::STRING, StringSDNode, etc., so
remove those and some associated code.
This makes stop-points considerably easier to read in
-view-legalize-dags output, and reduces overhead (creating new
nodes and copying std::strings into them) on code containing
debugging information.
llvm-svn: 52924
it impossible to create a MERGE_VALUES node with
only one result: sometimes it is useful to be able
to create a node with only one result out of one of
the results of a node with more than one result, for
example because the new node will eventually be used
to replace a one-result node using ReplaceAllUsesWith,
cf X86TargetLowering::ExpandFP_TO_SINT. On the other
hand, most users of MERGE_VALUES don't need this and
for them the optimization was valuable. So add a new
utility method getMergeValues for creating MERGE_VALUES
nodes which by default performs the optimization.
Change almost everywhere to use getMergeValues (and
tidy some stuff up at the same time).
llvm-svn: 52893
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub
llvm-svn: 52706
shuffle could be skipped. The check is invalid because the loop index i
doesn't correspond to the element actually inserted. The correct check is
already done a few lines earlier, for whether the element is already in
the right spot, so this shouldn't have any effect on the codegen for
code that was already correct.
llvm-svn: 52486
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits. Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.
llvm-svn: 52098
and better control the abstraction. Rename the type
to MVT. To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits(). Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).
llvm-svn: 52044
code generator would do something like this:
f64 = load f32 <anyext>, f32mem
v2f64 = insertelt undef, %0, 0
v2f64 = insertelt %1, 0.0, 1
into
v2f64 = vzext_load f32mem
which on x86 is movsd, when you really wanted a cvtss2sd/movsd pair.
llvm-svn: 51624
Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.
Initial PowerPC tail call implementation:
Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if
caller and callee are fastcc
call is a tail call (in tail call position, call followed by ret)
no variable argument lists or byval arguments
option -tailcallopt is enabled
Supported:
* non pic tail calls on linux/darwin
* module-local tail calls on linux(PIC/GOT)/darwin(PIC)
* inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.
A test checking the argument lowering behaviour on x86-64 was added.
llvm-svn: 50477
- Make targetlowering.h fit in 80 cols.
- Make LowerAsmOperandForConstraint const.
- Make lowerXConstraint -> LowerXConstraint
- Make LowerXConstraint return a const char* instead of taking a string byref.
llvm-svn: 50312
On Darwin / Linux x86-32, v8i8, v4i16, v2i32 values are passed in MM[0-2].
On Darwin / Linux x86-32, v1i64 values are passed in memory.
On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].
On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.
llvm-svn: 50257
argument. The x86-64 ABI requires the incoming value of %rdi to
be copied to %rax on exit from a function that is returning a
large C struct.
Also, add a README-X86-64 entry detailing the missed optimization
opportunity and proposing an alternative approach.
llvm-svn: 50075
Rename SDOperandImpl back to SDOperand.
Introduce the SDUse class that represents a use of the SDNode referred by
an SDOperand. Now it is more similar to Use/Value classes.
Patch is approved by Dan Gohman.
llvm-svn: 49795
memcpy lowering code; this ensures that the size node has the desired
result type. This fixes a regression from r49572 with @llvm.memcpy.i64
on x86-32.
llvm-svn: 49761
optimized x86-64 (and x86) calls so that they work (... at least for
my test cases).
Should fix the following problems:
Problem 1: When i introduced the optimized handling of arguments for
tail called functions (using a sequence of copyto/copyfrom virtual
registers instead of always lowering to top of the stack) i did not
handle byval arguments correctly e.g they did not work at all :).
Problem 2: On x86-64 after the arguments of the tail called function
are moved to their registers (which include ESI/RSI etc), tail call
optimization performs byval lowering which causes xSI,xDI, xCX
registers to be overwritten. This is handled in this patch by moving
the arguments to virtual registers first and after the byval lowering
the arguments are moved from those virtual registers back to
RSI/RDI/RCX.
llvm-svn: 49584
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.
Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.
This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.
Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.
This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.
llvm-svn: 49572
LLVM Value/Use does and MachineRegisterInfo/MachineOperand does.
This allows constant time for all uses list maintenance operations.
The idea was suggested by Chris. Reviewed by Evan and Dan.
Patch is tested and approved by Dan.
On normal use-cases compilation speed is not affected. On very big basic
blocks there are compilation speedups in the range of 15-20% or even better.
llvm-svn: 48822
flags. This is needed by the new legalize types
infrastructure which wants to expand the 64 bit
constants previously used to hold the flags on
32 bit machines. There are two functional changes:
(1) in LowerArguments, if a parameter has the zext
attribute set then that is marked in the flags;
before it was being ignored; (2) PPC had some bogus
code for handling two word arguments when using the
ELF 32 ABI, which was hard to convert because of
the bogusness. As suggested by the original author
(Nicolas Geoffray), I've disabled it for the moment.
Tested with "make check" and the Ada ACATS testsuite.
llvm-svn: 48640
1. There is now a "PAListPtr" class, which is a smart pointer around
the underlying uniqued parameter attribute list object, and manages
its refcount. It is now impossible to mess up the refcount.
2. PAListPtr is now the main interface to the underlying object, and
the underlying object is now completely opaque.
3. Implementation details like SmallVector and FoldingSet are now no
longer part of the interface.
4. You can create a PAListPtr with an arbitrary sequence of
ParamAttrsWithIndex's, no need to make a SmallVector of a specific
size (you can just use an array or scalar or vector if you wish).
5. All the client code that had to check for a null pointer before
dereferencing the pointer is simplified to just access the
PAListPtr directly.
6. The interfaces for adding attrs to a list and removing them is a
bit simpler.
Phase #2 will rename some stuff (e.g. PAListPtr) and do other less
invasive changes.
llvm-svn: 48289
field to 32 bits, thus enabling correct handling of ByVal
structs bigger than 0x1ffff. Abstract interface a bit.
Fixes gcc.c-torture/execute/pr23135.c and
gcc.c-torture/execute/pr28982b.c in gcc testsuite (were ICE'ing
on ppc32, quietly producing wrong code on x86-32.)
llvm-svn: 48122
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle. This allows us to compile
vec_set-B.ll into:
_test3:
movl $1234567, %eax
andl 4(%esp), %eax
movd %eax, %xmm0
ret
instead of:
_test3:
subl $28, %esp
movl $1234567, %eax
andl 32(%esp), %eax
movl %eax, (%esp)
movl $0, 4(%esp)
movq (%esp), %xmm0
addl $28, %esp
ret
llvm-svn: 48090
2) Don't try to insert an i64 value into the low part of a
vector with movq on an x86-32 target. This allows us to
compile:
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}
into:
_doload64:
movaps LCPI1_0, %xmm0
ret
instead of:
_doload64:
subl $28, %esp
movl $0, 4(%esp)
movl $1, (%esp)
movq (%esp), %xmm0
addl $28, %esp
ret
llvm-svn: 48057
stack slot and store if the SINT_TO_FP is actually legal. This allows
us to compile:
double a(double b) {return (unsigned)b;}
to:
_a:
cvttsd2siq %xmm0, %rax
movl %eax, %eax
cvtsi2sdq %rax, %xmm0
ret
instead of:
_a:
subq $8, %rsp
cvttsd2siq %xmm0, %rax
movl %eax, %eax
cvtsi2sdq %rax, %xmm0
addq $8, %rsp
ret
crazy.
llvm-svn: 47660
GOT-style position independent code. Before only tail calls to
protected/hidden functions within the same module were optimized.
Now all function calls are tail call optimized.
llvm-svn: 47594
calls. Before arguments that could overwrite each other were
explicitly lowered to a stack slot, not giving the register allocator
a chance to optimize. Now a sequence of copyto/copyfrom virtual
registers ensures that arguments are loaded in (virtual) registers
before they are lowered to the stack slot (and might overwrite each
other). Also parameter stack slots are marked mutable for
(potentially) tail calling functions.
llvm-svn: 47593
instead of with mmx registers. This horribleness is apparently
done by gcc to avoid having to insert emms in places that really
should have it. This is the second half of rdar://5741668.
llvm-svn: 47474
GCC apparently does this, and code depends on not having to do
emms when this happens. This is x86-64 only so far, second half
should handle x86-32.
rdar://5741668
llvm-svn: 47470
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.
This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.
llvm-svn: 47277
the return value is zero-extended if it isn't
sign-extended. It may also be any-extended.
Also, if a floating point value was returned
in a larger floating point type, pass 1 as the
second operand to FP_ROUND, which tells it
that all the precision is in the original type.
I think this is right but I could be wrong.
Finally, when doing libcalls, set isZExt on
a parameter if it is "unsigned". Currently
isSExt is set when signed, and nothing is
set otherwise. This should be right for all
calls to standard library routines.
llvm-svn: 47122
1) ConstantFP is now expand by default
2) ConstantFP is not turned into TargetConstantFP during Legalize
if it is legal.
This allows ConstantFP to be handled like Constant, allowing for
targets that can encode FP immediates as MachineOperands.
As a bonus, fix up Itanium FP constants, which now correctly match,
and match more constants! Hooray.
llvm-svn: 47121
initializer problem, a minor tweak to the way the
DAGISelEmitter finds load/store nodes, and a renaming of the
new PseudoSourceValue objects.
llvm-svn: 46827
Added ISD::DECLARE node type to represent llvm.dbg.declare intrinsic. Now the intrinsic calls are lowered into a SDNode and lives on through out the codegen passes.
For now, since all the debugging information recording is done at isel time, when a ISD::DECLARE node is selected, it has the side effect of also recording the variable. This is a short term solution that should be fixed in time.
llvm-svn: 46659
in the backend. Introduce a new SDNode type, MemOperandSDNode, for
holding a MemOperand in the SelectionDAG IR, and add a MemOperand
list to MachineInstr, and code to manage them. Remove the offset
field from SrcValueSDNode; uses of SrcValueSDNode that were using
it are all all using MemOperandSDNode now.
Also, begin updating some getLoad and getStore calls to use the
PseudoSourceValue objects.
Most of this was written by Florian Brander, some
reorganization and updating to TOT by me.
llvm-svn: 46585
This case returns the value in ST(0) and then has to convert it to an SSE
register. This causes significant codegen ugliness in some cases. For
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
_bar:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.
Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.
This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case. This gives
us this code for the example above:
_bar:
subl $12, %esp
call L_foo$stub
addl $12, %esp
ret
The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert). This is gross, but
less gross than the code it is replacing :)
This also allows us to generate better code in several other cases. For
example on fp-stack-ret-conv.ll, we now generate:
_test:
subl $12, %esp
call L_foo$stub
fstps 8(%esp)
movl 16(%esp), %eax
cvtss2sd 8(%esp), %xmm0
movsd %xmm0, (%eax)
addl $12, %esp
ret
where before we produced (incidentally, the old bad code is identical to what
gcc produces):
_test:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
cvtsd2ss (%esp), %xmm0
cvtss2sd %xmm0, %xmm0
movl 16(%esp), %eax
movsd %xmm0, (%eax)
addl $12, %esp
ret
Note that we generate slightly worse code on pr1505b.ll due to a scheduling
deficiency that is unrelated to this patch.
llvm-svn: 46307
precision integers. This won't actually work
(and most of the code is dead) unless the new
legalization machinery is turned on. While
there, I rationalized the handling of i1, and
removed some bogus (and unused) sextload patterns.
For i1, this could result in microscopically
better code for some architectures (not X86).
It might also result in worse code if annotating
with AssertZExt nodes turns out to be more harmful
than helpful.
llvm-svn: 46280
1. Legalize now always promotes truncstore of i1 to i8.
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
safe.
The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:
_foo:
fldt 20(%esp)
fldt 4(%esp)
faddp %st(1)
movl 36(%esp), %eax
fstps (%eax)
ret
instead of:
_foo:
subl $4, %esp
fldt 24(%esp)
fldt 8(%esp)
faddp %st(1)
fstps (%esp)
movl 40(%esp), %eax
movss (%esp), %xmm0
movss %xmm0, (%eax)
addl $4, %esp
ret
llvm-svn: 46140
and switch various codegen pieces and the X86 backend over
to using it.
* Add some comments to SelectionDAGNodes.h
* Introduce a second argument to FP_ROUND, which indicates
whether the FP_ROUND changes the value of its input. If
not it is safe to xform things like fp_extend(fp_round(x)) -> x.
llvm-svn: 46125
it should work, but I have no machine to test
it on. Committed because it will at least
cause no harm, and maybe someone can test it
for me!
llvm-svn: 46098
make the 'fp return in ST(0)' optimization smart enough to
look through token factor nodes. THis allows us to compile
testcases like CodeGen/X86/fp-stack-retcopy.ll into:
_carg:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
fldl (%esp)
addl $12, %esp
ret
instead of:
_carg:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
Still not optimal, but much better and this is a trivial patch. Fixing
the rest requires invasive surgery that is is not llvm 2.2 material.
llvm-svn: 46054
commit all arguments where moved to the stack slot where they would
reside on a normal function call before the lowering to the tail call
stack slot. This was done to prevent arguments overwriting each other.
Now only arguments sourcing from a FORMAL_ARGUMENTS node or a
CopyFromReg node with virtual register (could also be a caller's
argument) are lowered indirectly.
--This line, and those below, will be ignored--
M X86/X86ISelLowering.cpp
M X86/README.txt
llvm-svn: 45867
unifying the copied algorithms and saving over 500 LOC. There should
be no functionality change, but please test on your favorite x86
target.
llvm-svn: 45627
that "machine" classes are used to represent the current state of
the code being compiled. Given this expanded name, we can start
moving other stuff into it. For now, move the UsedPhysRegs and
LiveIn/LoveOuts vectors from MachineFunction into it.
Update all the clients to match.
This also reduces some needless #includes, such as MachineModuleInfo
from MachineFunction.
llvm-svn: 45467
e.g. MO.isMBB() instead of MO.isMachineBasicBlock(). I don't plan on
switching everything over, so new clients should just start using the
shorter names.
Remove old long accessors, switching everything over to use the short
accessor: getMachineBasicBlock() -> getMBB(),
getConstantPoolIndex() -> getIndex(), setMachineBasicBlock -> setMBB(), etc.
llvm-svn: 45464
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.
llvm-svn: 44954
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
(i32 extract_vector_element 0) does not require a pextrw.
llvm-svn: 44836
the function type, instead they belong to functions
and function calls. This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll). Hopefully
a bitcode guru (who might that be? :) ) will fix it.
llvm-svn: 44359
sometimes emit "zero" and "all one" vectors multiple times,
for example:
_test2:
pcmpeqd %mm0, %mm0
movq %mm0, _M1
pcmpeqd %mm0, %mm0
movq %mm0, _M2
ret
instead of:
_test2:
pcmpeqd %mm0, %mm0
movq %mm0, _M1
movq %mm0, _M2
ret
This patch fixes this by always arranging for zero/one vectors
to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
any random type. This ensures they get trivially CSE'd on the dag.
This fix is also important for LegalizeDAGTypes, as it gets unhappy
when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
'i64' isn't legal.
This patch makes the following changes:
1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
their canonical types.
2) The now-dead patterns are removed from the SSE/MMX .td files.
3) All the patterns in the .td file that referred to immAllOnesV or
immAllZerosV in the wrong form now use *_bc to match them with a
bitcast wrapped around them.
4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle
bitcast'd zero vectors, which simplifies the code actually.
5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
is legal, instead of generating one that is illegal and expecting
a later legalize pass to clean it up.
6) isZeroShuffle is generalized to handle bitcast of zeros.
7) several other minor tweaks.
This patch is definite goodness, but has the potential to cause random
code quality regressions. Please be on the lookout for these and let
me know if they happen.
llvm-svn: 44310
1) Change the interface to TargetLowering::ExpandOperationResult to
take and return entire NODES that need a result expanded, not just
the value. This allows us to handle things like READCYCLECOUNTER,
which returns two values.
2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES.
3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new
ExpandOperationResult. This makes the result simpler and fully
general.
4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes.
5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM
i64 shifts, allowing them to work with LegalizeDAGTypes.
6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT,
allowing them to work with LegalizeDAGTypes.
LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when
type legalization in LegalizeDAG is ifdef'd out.
llvm-svn: 44300
adjustment fields, and an optional flag. If there is a "dynamic_stackalloc" in
the code, make sure that it's bracketed by CALLSEQ_START and CALLSEQ_END. If
not, then there is the potential for the stack to be changed while the stack's
being used by another instruction (like a call).
This can only result in tears...
llvm-svn: 44037
transformation. Previously, it's restricted by ensuring the number of load uses
is one. Now the restriction is loosened up by allowing setcc uses to be
"extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq).
llvm-svn: 43465
To do this it is necessary to add a "always inline" argument to the
memcpy node. For completeness I have also added this node to memmove
and memset. I have also added getMem* functions, because the extra
argument makes it cumbersome to use getNode and because I get confused
by it :-)
llvm-svn: 43172