support for x86, and UMULO/SMULO for many architectures, including PPC
(PR4201), ARM, and Cell. The resulting expansion isn't perfect, but it's
not bad.
llvm-svn: 73477
incomming chain of the RETURN node. The incomming chain must
be the outgoing chain of the CALL node. This causes the
backend to identify tail calls that are not tail calls. This
patch fixes this.
llvm-svn: 73387
on x86 to handle more cases. Fix a bug in said code that would cause it
to read past the end of an object. Rewrite the code in
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general.
Remove PerformBuildVectorCombine, which is no longer necessary with
these changes. In addition to simplifying the code, with this change,
we can now catch a few more cases of consecutive loads.
llvm-svn: 73012
integer type to be consistent with normal operation legalization. No visible
change because nothing is actually using this at the moment.
llvm-svn: 72980
Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.
llvm-svn: 72959
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
integer and floating-point opcodes, introducing
FAdd, FSub, and FMul.
For now, the AsmParser, BitcodeReader, and IRBuilder all preserve
backwards compatability, and the Core LLVM APIs preserve backwards
compatibility for IR producers. Most front-ends won't need to change
immediately.
This implements the first step of the plan outlined here:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt
llvm-svn: 72897
using Promote which won't work because i64 isn't
a legal type. It's easy enough to use Custom, but
then we have the problem that when the type
legalizer is promoting FP_TO_UINT->i16, it has no
way of telling it should prefer FP_TO_SINT->i32
to FP_TO_UINT->i32. I have uncomfortably hacked
this by making the type legalizer choose FP_TO_SINT
when both are Custom.
This fixes several regressions in the testsuite.
llvm-svn: 72891
instcombine doesn't know when it's safe. To partially compensate
for this, introduce new code to do this transformation in
dagcombine, which can use UnsafeFPMath.
llvm-svn: 72872
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
failure during llvm-gcc bootstrap:
Assertion failed: (!Tmp2.getNode() && "Can't legalize BR_CC with legal condition!"), function ExpandNode, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore.roots/llvmCore~obj/src/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp, line 2923.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/gcc/libgcc2.c:1727: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
llvm-svn: 72530
This is basically the end of this series of patches for LegalizeDAG; the
remaining special cases can't be removed without more infrastructure
work. There's a FIXME for each relevant opcode near the beginning of
SelectionDAGLegalize::LegalizeOp.
llvm-svn: 72514
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
doesn't split legal vector operands. This is necessary because the
type legalization (and therefore, vector splitting) code will be going
away soon.
llvm-svn: 72349
The DAGCombiner created a negative shiftamount, stored in an
unsigned variable. Later the optimizer eliminated the shift entirely as being
undefined.
Example: (srl (shl X, 56) 48). ShiftAmt is 4294967288.
Fix it by checking that the shiftamount is positive, and storing in a signed
variable.
llvm-svn: 72331
will allow simplifying LegalizeDAG to eliminate type legalization. (I
have a patch to do that, but it's not quite finished; I'll commit it
once it's finished and I've fixed any review comments for this patch.)
See the comment at the beginning of
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp for more details on the
motivation for this patch.
llvm-svn: 72325
code in preparation for code generation. The main thing it does
is handle the case when eh.exception calls (and, in a future
patch, eh.selector calls) are far away from landing pads. Right
now in practice you only find eh.exception calls close to landing
pads: either in a landing pad (the common case) or in a landing
pad successor, due to loop passes shifting them about. However
future exception handling improvements will result in calls far
from landing pads:
(1) Inlining of rewinds. Consider the following case:
In function @f:
...
invoke @g to label %normal unwind label %unwinds
...
unwinds:
%ex = call i8* @llvm.eh.exception()
...
In function @g:
...
invoke @something to label %continue unwind label %handler
...
handler:
%ex = call i8* @llvm.eh.exception()
... perform cleanups ...
"rethrow exception"
Now inline @g into @f. Currently this is turned into:
In function @f:
...
invoke @something to label %continue unwind label %handler
...
handler:
%ex = call i8* @llvm.eh.exception()
... perform cleanups ...
invoke "rethrow exception" to label %normal unwind label %unwinds
unwinds:
%ex = call i8* @llvm.eh.exception()
...
However we would like to simplify invoke of "rethrow exception" into
a branch to the %unwinds label. Then %unwinds is no longer a landing
pad, and the eh.exception call there is then far away from any landing
pads.
(2) Using the unwind instruction for cleanups.
It would be nice to have codegen handle the following case:
invoke @something to label %continue unwind label %run_cleanups
...
handler:
... perform cleanups ...
unwind
This requires turning "unwind" into a library call, which
necessarily takes a pointer to the exception as an argument
(this patch also does this unwind lowering). But that means
you are using eh.exception again far from a landing pad.
(3) Bugpoint simplifications. When bugpoint is simplifying
exception handling code it often generates eh.exception calls
far from a landing pad, which then causes codegen to assert.
Bugpoint then latches on to this assertion and loses sight
of the original problem.
Note that it is currently rare for this pass to actually do
anything. And in fact it normally shouldn't do anything at
all given the code coming out of llvm-gcc! But it does fire
a few times in the testsuite. As far as I can see this is
almost always due to the LoopStrengthReduce codegen pass
introducing pointless loop preheader blocks which are landing
pads and only contain a branch to another block. This other
block contains an eh.exception call. So probably by tweaking
LoopStrengthReduce a bit this can be avoided.
llvm-svn: 72276
build an integer and cast that to a float. This fixes a crash
caused by trying to split an f32 into two f16's.
This changes the behavior in test/CodeGen/XCore/fneg.ll because that
testcase now triggers a DAGCombine which converts the fneg into an integer
operation. If someone is interested, it's probably possible to tweak
the test to generate an actual fneg.
llvm-svn: 72162
of exception handling builtin sjlj targets in functions turns out not to
be necessary. Marking the intrinsic implementation in the .td file as
defining all registers is sufficient to get the context saved properly by
the containing function.
llvm-svn: 71743
a supporting preliminary patch for GCC-compatible SjLJ exception handling. Note that these intrinsics are not designed to be invoked directly by the user, but
rather used by the front-end as target hooks for exception handling.
llvm-svn: 71610
type, rather than assume that it does. If the operand is not vector, it
shouldn't be run through ScalarizeVectorOp. This fixes one of the
testcases in PR3886.
llvm-svn: 71453
None. However, we were always recording the region end. There's no longer a good
reason for this code to be separated out between the different opt levels, as it
was doing pretty much the same thing anyway.
llvm-svn: 71370
inlined function or the end of a function. Before, this was never executing the
"inlined" version of the Record method.
This will become important once the inlined Dwarf writer patch lands.
llvm-svn: 71268
checking for bcopy... no
checking for getc_unlocked... Assertion failed: (0 && "Unknown SCEV kind!"), function operator(), file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore.roots/llvmCore~obj/src/lib/Analysis/ScalarEvolution.cpp, line 511.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/libdecnumber/decUtility.c:360: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [decUtility.o] Error 1
make[4]: *** Waiting for unfinished jobs....
Assertion failed: (0 && "Unknown SCEV kind!"), function operator(), file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore.roots/llvmCore~obj/src/lib/Analysis/ScalarEvolution.cpp, line 511.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/libdecnumber/decNumber.c:5591: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [decNumber.o] Error 1
make[3]: *** [all-stage2-libdecnumber] Error 2
make[3]: *** Waiting for unfinished jobs....
llvm-svn: 71165
shows up in -print-machineinstrs. This doesn't appear to affect anything, but it was
weird for some DBG_LABELs to have DebugLocs but not all of them.
llvm-svn: 70921
-Replace DebugLocTuple's Source ID with CompileUnit's GlobalVariable*
-Remove DwarfWriter::getOrCreateSourceID
-Make necessary changes for the above (fix callsites, etc.)
llvm-svn: 70520
memory operands otherwise the writebacks get lost when the inline asm
doesn't otherwise have side effects. This fixes rdar://6839427, though
clang really shouldn't generate these anymore.
llvm-svn: 70455
anything larger than 64-bits, avoiding a crash. This should
really be fixed to use APInts, though type legalization happens
to help us out and we get good code on the attached testcase at
least.
This fixes rdar://6836460
llvm-svn: 70360
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
This particular one is undefined behavior (although this
isn't related to the crash), so it will no longer do it
at compile time, which seems better.
llvm-svn: 69990
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
use ISD::EXTRACT_ELEMENT. SelectionDAG has a special fast-path for
the cast of an EXTRACT_ELEMENT with a BUILD_PAIR operand, for the
common case.
llvm-svn: 69948
type as the vector element type: allow them to be of
a wider integer type than the element type all the way
through the system, and not just as far as LegalizeDAG.
This should be safe because it used to be this way
(the old type legalizer would produce such nodes), so
backends should be able to handle it. In fact only
targets which have legal vector types with an illegal
promoted element type will ever see this (eg: <4 x i16>
on ppc). This fixes a regression with the new type
legalizer (vec_splat.ll). Also, treat SCALAR_TO_VECTOR
the same as BUILD_VECTOR. After all, it is just a
special case of BUILD_VECTOR.
llvm-svn: 69467
add dependencies on nodes with exactly one successor which is a
COPY_TO_REGCLASS node. In the case that the copy is coalesced
away, the dependence should be on the user of the copy, rather
than the copy itself.
llvm-svn: 69309
Insetad of doing ...
if (inlined_subroutine && known_location)
DW_TAG_inline_subroutine
else
DW_TAG_subprogram
do
if (inlined_subroutine) {
if (known_location)
DW_TAG_inline_subroutine
} else {
DW_TAG_subprogram
}
llvm-svn: 69300
to support replacing a node with another that has a superset of
the result types. Use this instead of calling
ReplaceAllUsesOfValueWith for each value.
llvm-svn: 69209
operator is used by a CopyToReg to export the value to a different
block, don't reuse the CopyToReg's register for the subreg operation
result if the register isn't precisely the right class for the
subreg operation.
Also, rename the h-registers.ll test, now that there are more
than one.
llvm-svn: 69087
promoted to legal types without changing the type of the vector. This is
following a suggestion from Duncan
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/019923.html).
The transformation that used to be done during type legalization is now
postponed to DAG legalization. This allows the BUILD_VECTORs to be optimized
and potentially handled specially by target-specific code.
It turns out that this is also consistent with an optimization done by the
DAG combiner: a BUILD_VECTOR and INSERT_VECTOR_ELT may be combined by
replacing one of the BUILD_VECTOR operands with the newly inserted element;
but INSERT_VECTOR_ELT allows its scalar operand to be larger than the
element type, with any extra high bits being implicitly truncated. The
result is a BUILD_VECTOR where one of the operands has a type larger the
the vector element type.
Any code that operates on BUILD_VECTORs may now need to be aware of the
potential type discrepancy between the vector element type and the
BUILD_VECTOR operands. This patch updates all of the places that I could
find to handle that case.
llvm-svn: 68996
This will be used to replace things like X86's MOV32to32_.
Enhance ScheduleDAGSDNodesEmit to be more flexible and robust
in the presense of subregister superclasses and subclasses. It
can now cope with the definition of a virtual register being in
a subclass of a use.
Re-introduce the code for recording register superreg classes and
subreg classes. This is needed because when subreg extracts and
inserts get coalesced away, the virtual registers are left in
the correct subclass.
llvm-svn: 68961
Create debug_inlined dwarf section using these information. This info is used by gdb, at least on Darwin, to enable better experience debugging inlined functions. See DwarfWriter.cpp for more information on structure of debug_inlined section.
llvm-svn: 68847
in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.
In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.
llvm-svn: 68670
eagerly. This helps avoid CopyToReg nodes in some cases where they
aren't needed, and also helps subsequent optimizer heuristics
in cases where the extra nodes would cause the node to appear
to have multiple results. This doesn't have a significant impact
currently; it'll help an upcoming change.
llvm-svn: 68667
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
stoppoint nodes around until Legalize; doing this
imposed an ordering on a sequence of loads that
came from different lines, interfering with scheduling.
llvm-svn: 67692
help out the register pressure reduction heuristics in the case of
nodes with multiple uses. Currently this uses very conservative
heuristics, so it doesn't have a broad impact, but in cases where it
does help it can make a big difference.
llvm-svn: 67586
a data dependency on the load node, so it really needs a
data-dependence edge to the load node, even if the load previously
existed.
And add a few comments.
llvm-svn: 67554
size by the array amount as an i32 value instead of promoting from
i32 to i64 then doing the multiply. Not doing this broke wrap-around
assumptions that the optimizers (validly) made. The ultimate real
fix for this is to introduce i64 version of alloca and remove mallocinst.
This fixes PR3829
llvm-svn: 67093
vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
if FPConstant is legal because if the FPConstant doesn't need to be stored
in a constant pool, the transformation is unlikely to be profitable.
llvm-svn: 66994
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.
llvm-svn: 66988
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
llvm-svn: 66941
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
and extern_weak_odr. These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global. In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time. This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function. If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body. The
code generators on the other hand map weak and weak_odr linkage
to the same thing.
llvm-svn: 66339
with multiple chain operands. This can occur when the scheduler
has added chain operands to a node that already has a chain
operand, in order to handle physical register dependencies.
This fixes an llvm-gcc bootstrap failure on x86-64 introduced
in r66058.
llvm-svn: 66240
so it changed it into a 31 via the TLO.ShrinkDemandedConstant() call. Then it
would go through the DAG combiner again. This time it had a value of 31, which
was turned into a -1 by TLI.SimplifyDemandedBits(). This would ping pong
forever.
Teach the TLO.ShrinkDemandedConstant() call not to lower a value if the demanded
value is an XOR of all ones.
llvm-svn: 65985
arbitrary vector sizes. Add an optional MinSplatBits parameter to specify
a minimum for the splat element size. Update the PPC target to use the
revised interface.
llvm-svn: 65899
extracts + build_vector into a shuffle would fail, because the
type of the new build_vector would not be legal. Try harder to
create a legal build_vector type. Note: this will be totally
irrelevant once vector_shuffle no longer takes a build_vector for
shuffle mask.
New:
_foo:
xorps %xmm0, %xmm0
xorps %xmm1, %xmm1
subps %xmm1, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
Old:
_foo:
xorps %xmm0, %xmm0
movss %xmm0, %xmm1
xorps %xmm2, %xmm2
unpcklps %xmm1, %xmm2
pshufd $80, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
pslldq $16, %xmm2
pshufd $57, %xmm2, %xmm1
subps %xmm0, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
llvm-svn: 65791
overly long ints, e.g. i96, into pieces at PHIs
and the nodes that feed into them; however big-endian
reverses the order of the pieces (for some reason), and
wasn't doing it the same way on both sides, so
the pieces didn't match and runtime failures ensued.
Fixes 188.ammp and sqlite3 on ppc32.
llvm-svn: 65481
them are generic changes.
- Use the "fast" flag that's already being passed into the asm printers instead
of shoving it into the DwarfWriter.
- Instead of calling "MI->getParent()->getParent()" for every MI, set the
machine function when calling "runOnMachineFunction" in the asm printers.
llvm-svn: 65379
a DBG_LABEL or not. We want to fall back to the original way of emitting debug
info when we're in -O0/-fast mode.
- Add plumbing in to pass the "Fast" flag to places that need it.
- XFAIL DebugInfo/deaddebuglabel.ll. This is finding 11 labels instead of 8. I
need to investigate still.
llvm-svn: 65367
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
llvm-svn: 65296
(Note: Eventually, commits like this will be handled via a pre-commit hook that
does this automagically, as well as expand tabs to spaces and look for 80-col
violations.)
llvm-svn: 64827
U include/llvm/CodeGen/DebugLoc.h
U lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuild.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
Enable debug location generation at -Os. This goes with the reapplication of the
r63639 patch.
llvm-svn: 64715
one bit set, because the bit may be shifted off the end. Instead,
just check for a constant 1 being shifted. This is still sufficient
to handle all the cases in test/CodeGen/X86/bt.ll. This fixes PR3583.
llvm-svn: 64622
Cleanup some warning.
Remark: when struct/class are declared differently than they are defined, this make problem for VC++ since it seems to mangle class differently that struct. These error are very hard to understand and find. So please, try to keep your definition/declaration in sync.
Only tested with VS2008. hope it does not break anything. feel free to revert.
llvm-svn: 64554
in inline asm as signed (what gcc does). Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.
llvm-svn: 64400
is determined by whether the node has a Flag operand. However, if the
node does have a Flag operand, it will be glued to its register's
def, so the heuristic would end up spuriously applying to whatever
node is the def.
llvm-svn: 64319
It was transforming (x&y)==y to (x&y)!=0 in the case where
y is variable and known to have at most one bit set (e.g. z&1).
This is not correct; the expressions are not equivalent when y==0.
I believe this patch salvages what can be salvaged, including
all the cases in bt.ll. Dan, please review.
Fixes gcc.c-torture/execute/20040709-[12].c
llvm-svn: 64314
instruction index across each part. Instruction indices are used
to make live range queries, and live ranges can extend beyond
scheduling region boundaries.
Refactor the ScheduleDAGSDNodes class some more so that it
doesn't have to worry about this additional information.
llvm-svn: 64288
scheduling, and generalize is so that preserves state across
scheduling regions. This fixes incorrect live-range information around
terminators and labels, which are effective region boundaries.
In place of looking for terminators to anchor inter-block dependencies,
introduce special entry and exit scheduling units for this purpose.
llvm-svn: 64254
Adjust derived classes to pass UnknownLoc where
a DebugLoc does not make sense. Pick one of
DebugLoc and non-DebugLoc variants to survive
for all such classes.
llvm-svn: 64000
Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base. There's no
sensible way to associate debug info with these. I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands.
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.
llvm-svn: 63992
SelectionDAGISel::CreateScheduler, and make it just create the
scheduler. Leave running the scheduler to the higher-level code.
This makes the higher-level code a little more explicit and
easier to follow, and will help enable some future refactoring.
llvm-svn: 63944
target directories themselves. This also means that VMCore no longer
needs to know about every target's list of intrinsics. Future work
will include converting the PowerPC target to this interface as an
example implementation.
llvm-svn: 63765
but when legalizing the operation, we split the vector type and generate a library
call whose type needs to be promoted. For example, X86 with SSE on but MMX off,
a divide v2i64 will be scalarized to 2 calls to a library using i64.
llvm-svn: 63760
support GraphViz, I've been using the foo->dump() facility. This
patch is a minor rewrite to the SelectionDAG dump() stuff to make it a
little more helpful. The existing foo->dump() functionality does not
change; this patch adds foo->dumpr(). All of this is only useful when
running LLVM under a debugger.
llvm-svn: 63736
in any old order. Since analyzing a node analyzes its
operands also, this can mean that when we pop a node
off the list of nodes to be analyzed, it may already
have been analyzed.
llvm-svn: 63632
information. This eliminates the need for the Flags field in MemSDNode,
so this makes LoadSDNode and StoreSDNode smaller. Also, it makes
FoldingSetNodeIDs for loads and stores two AddIntegers smaller.
llvm-svn: 63577
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
llvm-svn: 63494
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization. Thanks to Dan for writing the
original patch (which I shamelessly pillaged).
llvm-svn: 63482
- Modify TableGen to add the DebugLoc when calling getTargetNode.
(The light-weight wrappers are only temporary. The non-DebugLoc version will be
removed once the whole debug info stuff is finished with.)
llvm-svn: 63273
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
llvm-svn: 63266
new isOperationLegalOrCustom, which does what isOperationLegal
previously did.
Update a bunch of callers to use isOperationLegalOrCustom
instead of isOperationLegal. In some case it wasn't obvious
which behavior is desired; when in doubt I changed then to
isOperationLegalOrCustom as that preserves their previous
behavior.
This is for the second half of PR3376.
llvm-svn: 63212
a uint64_t to verify that the value is in range for the given type,
to help catch accidental overflow. Fix a few places that relied on
getConstant implicitly truncating the value.
llvm-svn: 63128
checking logic. Rather than make the checking more
complicated, I've tweaked some logic to make things
conform to how the checking thought things ought to
be, since this results in a simpler "mental model".
llvm-svn: 63048
tidy up SDUse and related code.
- Replace the operator= member functions with a set method, like
LLVM Use has, and variants setInitial and setNode, which take
care up updating use lists, like LLVM Use's does. This simplifies
code that calls these functions.
- getSDValue() is renamed to get(), as in LLVM Use, though most
places can either use the implicit conversion to SDValue or the
convenience functions instead.
- Fix some more node vs. value terminology issues.
Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.
llvm-svn: 62995
of each use in the SelectionDAG ReplaceAllUses* functions. Thanks
to Chris for spotting this opportunity.
Also, factor out code from all 5 of the ReplaceAllUses* functions
into AddNonLeafNodeToCSEMaps, which is now renamed
AddModifiedNodeToCSEMaps to more accurately reflect its purpose.
llvm-svn: 62964
testcase from PR3376, and in fact is sufficient to completely
avoid the problem in that testcase.
There's an underlying problem though; TLI.isOperationLegal
considers Custom to be Legal, which might be ok in some
cases, but that's what DAGCombiner is using in many places
to test if something is legal when LegalOperations is true.
When DAGCombiner is running after legalize, this isn't
sufficient. I'll address this in a separate commit.
llvm-svn: 62860
to "C ^ 1" is only valid when C is known to be either 0 or 1. Most of the
similar foldings in this function only handle "i1" types, but this one appears
intentionally written to handle larger integer types. If C has an integer
type larger than "i1", this needs to check if the high bits of a boolean
are known to be zero. I also changed the comment to describe this folding as
"C ^ 1" instead of "~C", since that is what the code does and since the latter
would only be valid for "i1" types. The good news is that most LLVM targets
use TargetLowering::ZeroOrOneBooleanContent so this change will not disable
the optimization; the bad news is that I've been unable to come up with a
testcase to demonstrate the problem.
I have also removed a "FIXME" comment for folding "select C, X, 0" to "C & X",
since the code looks correct to me. It could be made more aggressive by not
limiting the type to "i1", but that would then require checking for
TargetLowering::ZeroOrNegativeOneBooleanContent. Similar changes could be
done for the other SELECT foldings, but it was decided to be not worth the
trouble and complexity (see e.g., r44663).
llvm-svn: 62790
Simplify x+0 to x in unsafe-fp-math mode. This avoids a bunch of
redundant work in many cases, because in unsafe-fp-math mode,
ISD::FADD with a constant is considered free to negate, so the
DAGCombiner often negates x+0 to -0-x thinking it's free, when
in reality the end result is -x, which is more expensive than x.
Also, combine x*0 to 0.
This fixes PR3374.
llvm-svn: 62789
special cases after producing the new reduced-width load, because the
new load already has the needed adjustments built into it. This fixes
several bugs due to the special cases, including PR3317.
llvm-svn: 62692
- Ensure that (operation) legalization emits proper FDIV libcall when needed.
- Fix various bugs encountered during llvm-spu-gcc build, along with various
cleanups.
- Start supporting double precision comparisons for remaining libgcc2 build.
Discovered interesting DAGCombiner feature, which is currently solved via
custom lowering (64-bit constants are not legal on CellSPU, but DAGCombiner
insists on inserting one anyway.)
- Update README.
llvm-svn: 62664
SDNode subclasses to keep state that requires non-trivial
destructors, however it was already effectively impossible,
since the destructor isn't actually ever called. There currently
aren't any SDNode subclasses affected by this, and in general
it's desireable to keep SDNode objects light-weight.
This eliminates the last virtual member function in the SDNode
class, so it eliminates the need for a vtable pointer, making
SDNode smaller.
llvm-svn: 62539
uses are added to the From node while it is processing From's
use list, because of automatic local CSE. The fix is to avoid
visiting any new uses.
Fix a few places in the DAGCombiner that assumed that after
a RAUW call, the From node has no users and may be deleted.
This fixes PR3018.
llvm-svn: 62533