in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.
In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.
llvm-svn: 68670
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
if FPConstant is legal because if the FPConstant doesn't need to be stored
in a constant pool, the transformation is unlikely to be profitable.
llvm-svn: 66994
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
extracts + build_vector into a shuffle would fail, because the
type of the new build_vector would not be legal. Try harder to
create a legal build_vector type. Note: this will be totally
irrelevant once vector_shuffle no longer takes a build_vector for
shuffle mask.
New:
_foo:
xorps %xmm0, %xmm0
xorps %xmm1, %xmm1
subps %xmm1, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
Old:
_foo:
xorps %xmm0, %xmm0
movss %xmm0, %xmm1
xorps %xmm2, %xmm2
unpcklps %xmm1, %xmm2
pshufd $80, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
pslldq $16, %xmm2
pshufd $57, %xmm2, %xmm1
subps %xmm0, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
llvm-svn: 65791
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
llvm-svn: 65296
(Note: Eventually, commits like this will be handled via a pre-commit hook that
does this automagically, as well as expand tabs to spaces and look for 80-col
violations.)
llvm-svn: 64827
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
llvm-svn: 63494
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization. Thanks to Dan for writing the
original patch (which I shamelessly pillaged).
llvm-svn: 63482
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
llvm-svn: 63266
new isOperationLegalOrCustom, which does what isOperationLegal
previously did.
Update a bunch of callers to use isOperationLegalOrCustom
instead of isOperationLegal. In some case it wasn't obvious
which behavior is desired; when in doubt I changed then to
isOperationLegalOrCustom as that preserves their previous
behavior.
This is for the second half of PR3376.
llvm-svn: 63212
a uint64_t to verify that the value is in range for the given type,
to help catch accidental overflow. Fix a few places that relied on
getConstant implicitly truncating the value.
llvm-svn: 63128
tidy up SDUse and related code.
- Replace the operator= member functions with a set method, like
LLVM Use has, and variants setInitial and setNode, which take
care up updating use lists, like LLVM Use's does. This simplifies
code that calls these functions.
- getSDValue() is renamed to get(), as in LLVM Use, though most
places can either use the implicit conversion to SDValue or the
convenience functions instead.
- Fix some more node vs. value terminology issues.
Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.
llvm-svn: 62995
testcase from PR3376, and in fact is sufficient to completely
avoid the problem in that testcase.
There's an underlying problem though; TLI.isOperationLegal
considers Custom to be Legal, which might be ok in some
cases, but that's what DAGCombiner is using in many places
to test if something is legal when LegalOperations is true.
When DAGCombiner is running after legalize, this isn't
sufficient. I'll address this in a separate commit.
llvm-svn: 62860
to "C ^ 1" is only valid when C is known to be either 0 or 1. Most of the
similar foldings in this function only handle "i1" types, but this one appears
intentionally written to handle larger integer types. If C has an integer
type larger than "i1", this needs to check if the high bits of a boolean
are known to be zero. I also changed the comment to describe this folding as
"C ^ 1" instead of "~C", since that is what the code does and since the latter
would only be valid for "i1" types. The good news is that most LLVM targets
use TargetLowering::ZeroOrOneBooleanContent so this change will not disable
the optimization; the bad news is that I've been unable to come up with a
testcase to demonstrate the problem.
I have also removed a "FIXME" comment for folding "select C, X, 0" to "C & X",
since the code looks correct to me. It could be made more aggressive by not
limiting the type to "i1", but that would then require checking for
TargetLowering::ZeroOrNegativeOneBooleanContent. Similar changes could be
done for the other SELECT foldings, but it was decided to be not worth the
trouble and complexity (see e.g., r44663).
llvm-svn: 62790
Simplify x+0 to x in unsafe-fp-math mode. This avoids a bunch of
redundant work in many cases, because in unsafe-fp-math mode,
ISD::FADD with a constant is considered free to negate, so the
DAGCombiner often negates x+0 to -0-x thinking it's free, when
in reality the end result is -x, which is more expensive than x.
Also, combine x*0 to 0.
This fixes PR3374.
llvm-svn: 62789
special cases after producing the new reduced-width load, because the
new load already has the needed adjustments built into it. This fixes
several bugs due to the special cases, including PR3317.
llvm-svn: 62692
uses are added to the From node while it is processing From's
use list, because of automatic local CSE. The fix is to avoid
visiting any new uses.
Fix a few places in the DAGCombiner that assumed that after
a RAUW call, the From node has no users and may be deleted.
This fixes PR3018.
llvm-svn: 62533
and into the ScheduleDAGInstrs class, so that they don't get
destructed and re-constructed for each block. This fixes a
compile-time hot spot in the post-pass scheduler.
To help facilitate this, tidy and do some minor reorganization
in the scheduler constructor functions.
llvm-svn: 62275
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.
Similar for SUB and MUL instructions.
llvm-svn: 60915
(this doesn't happen that often, since most code
does not use illegal types) then follow it by a
DAG combiner run that is allowed to generate
illegal operations but not illegal types. I didn't
modify the target combiner code to distinguish like
this between illegal operations and illegal types,
so it will not produce illegal operations as well
as not producing illegal types.
llvm-svn: 59960
"ISD::ADDO". ISD::ADDO is lowered into a target-independent form that does the
addition and then checks if the result is less than one of the operands. (If it
is, then there was an overflow.)
llvm-svn: 59779
The CC was changed, but wasn't checked to see if it was legal if the DAG
combiner was being run after legalization. Threw in a couple of checks just to
make sure that it's okay. As far as the PR is concerned, no back-end target
actually exhibited this problem, so there isn't an associated testcase.
llvm-svn: 59035
elements. Otherwise LegalizeTypes will, reasonably
enough, legalize the mask, which may result in it
no longer being a BUILD_VECTOR node (LegalizeDAG
simply ignores the legality or not of vector masks).
llvm-svn: 57782
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)
This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.
This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.
Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.
The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.
llvm-svn: 57748
shift counts, and patterns that match dynamic shift counts
when the subtract is obscured by a truncate node.
Add DAGCombiner support for recognizing rotate patterns
when the shift counts are defined by truncate nodes.
Fix and simplify the code for commuting shld and shrd
instructions to work even when the given instruction doesn't
have a parent, and when the caller needs a new instruction.
These changes allow LLVM to use the shld, shrd, rol, and ror
instructions on x86 to replace equivalent code using two
shifts and an or in many more cases.
llvm-svn: 57662
the SelectionDAG and DAGCombiner code. The only functionality change is that now
the DAG combiner is performing the constant folding for these operations instead
of being a no-op.
This is *not* in response to a bug, so there isn't a testcase.
llvm-svn: 56550
ConstantFP* instead of APInt and APFloat directly.
This reduces the amount of time to create ConstantSDNode
and ConstantFPSDNode nodes when ConstantInt* and ConstantFP*
respectively are already available, as is the case in
SelectionDAGBuild.cpp. Also, it reduces the amount of time
to legalize constants into constant pools, and the amount of
time to add ConstantFP operands to MachineInstrs, due to
eliminating ConstantInt::get and ConstantFP::get calls.
It increases the amount of work needed to create new constants
in cases where the client doesn't already have a ConstantInt*
or ConstantFP*, such as legalize expanding 64-bit integer constants
to 32-bit constants. And it adds a layer of indirection for the
accessor methods. But these appear to be outweight by the benefits
in most cases.
It will also make it easier to make ConstantSDNode and
ConstantFPNode more consistent with ConstantInt and ConstantFP.
llvm-svn: 56162
before. This is taken care of in the selection DAG pass. In my opinion, this
should be in one place or the other. I.e., it should probably be removed from
the DAG combiner (along with the other arithmetic transformations on constants
that are essentially no-ops).
llvm-svn: 55889
its work by putting all nodes in the worklist, requiring a big
dynamic allocation. Now, DAGCombiner just iterates over the AllNodes
list and maintains a worklist for nodes that are newly created or
need to be revisited. This allows the worklist to stay small in most
cases, so it can be a SmallVector.
This has the side effect of making DAGCombine not miss a folding
opportunity in alloca-align-rounding.ll.
llvm-svn: 55498
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor). Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check". My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!
llvm-svn: 53850
the night realising that it was wrong :) I
think the reason the same type was being used
for the shufflevec of indices as for the actual
indices is so that if one of them needs splitting
then so does the other. After my patch it might
be that the indices need splitting but not the
rest, yet there is no good way of handling that.
I think the right solution is to not have the
shufflevec be an operand at all: just have it
be the list of numbers it actually is, stored
as extra info in the node.
llvm-svn: 53768
mask. These are just indices into the shuffled vector
so their type is unrelated to the type of the
shuffled elements (which is what was being used before).
This fixes vec_shuffle-11.ll when using LegalizeTypes.
What seems to have happened is that Dan's recent change
r53687, which corrected the result type of the shuffle,
somehow caused LegalizeTypes to notice that the mask
operand was a BUILD_VECTOR with a legal type but elements
of an illegal type (i64). LegalizeTypes legalized this
by introducing a new BUILD_VECTOR of i32 and bitcasting
it to the old type. But the mask operand is not supposed
to be a bitcast but a straight BUILD_VECTOR of constants,
causing a crash.
llvm-svn: 53729
SelectionDAG::allnodes_size is linear, but that doesn't appear to
outweigh the benefit of reducing heap traffic. If it does become a
problem, we should teach SelectionDAG to keep a count of how many
nodes are live, because there are several other places where that
information would be useful as well.
llvm-svn: 52926
still excluding types like i1 (not byte sized)
and i120 (loading an i120 requires loading an i64,
an i32, an i16 and an i8, which is expensive).
llvm-svn: 52310
wrong for volatile loads and stores. In fact this
is almost all of them! There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access. These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used. Consider
loading an i32 but only using the lower 8 bits. It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes. It
is also unwise to make a load/store wider. For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects. (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware. (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several. For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores). In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic. My policy here is
to say that the number of processor operations for
an illegal operation is undefined. So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok. It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal! That is because
operations are marked legal by default, regardless of
whether the type is legal or not. In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation. However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal. So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before. This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.
llvm-svn: 52254
maps can be deleted. This happens when RAUW
replaces a node N with another equivalent node
E, deleting the first node. Solve this by
adding (N, E) to ReplacedNodes, which is already
used to remap nodes to replacements. This means
that deleted nodes are being allowed in maps,
which can be delicate: the memory may be reused
for a new node which might get confused with the
old deleted node pointer hanging around in the
maps, so detect this and flush out maps if it
occurs (ExpungeNode). The expunging operation
is expensive, however it never occurs during
a llvm-gcc bootstrap or anywhere in the nightly
testsuite. It occurs three times in "make check":
Alpha/illegal-element-type.ll,
PowerPC/illegal-element-type.ll and
X86/mmx-shift.ll. If expunging proves to be too
expensive then there are other more complicated
ways of solving the problem.
In the normal case this patch adds the overhead
of a few more map lookups, which is hopefully
negligable.
llvm-svn: 52214
of integer types. Fix the isMask APInt method to
actually work (hopefully) rather than crashing
because it adds apints of different bitwidths.
It looks like isShiftedMask is also broken, but
I'm leaving that one to the APInt people (it is
not used anywhere).
llvm-svn: 52142
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits. Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.
llvm-svn: 52098
and better control the abstraction. Rename the type
to MVT. To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits(). Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).
llvm-svn: 52044
Rename SDOperandImpl back to SDOperand.
Introduce the SDUse class that represents a use of the SDNode referred by
an SDOperand. Now it is more similar to Use/Value classes.
Patch is approved by Dan Gohman.
llvm-svn: 49795
LLVM Value/Use does and MachineRegisterInfo/MachineOperand does.
This allows constant time for all uses list maintenance operations.
The idea was suggested by Chris. Reviewed by Evan and Dan.
Patch is tested and approved by Dan.
On normal use-cases compilation speed is not affected. On very big basic
blocks there are compilation speedups in the range of 15-20% or even better.
llvm-svn: 48822
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.
llvm-svn: 48279
Change several cases in SimplifyDemandedMask that don't ever do any
simplifying to reuse the logic in ComputeMaskedBits instead of
duplicating it.
llvm-svn: 47648
after legalize. Just because a constant is legal (e.g. 0.0 in SSE)
doesn't mean that its negated value is legal (-0.0). We could make
this stronger by checking to see if the negated constant is actually
legal post negation, but it doesn't seem like a big deal.
llvm-svn: 47591
DAGUpdateListener object pointer instead of just returning a vector
of deleted nodes. This makes the interfaces more efficient (no more
allocating a vector [at least a malloc], filling it in, then walking
it) and more clean. This also allows the client to be notified of
nodes that are *changed* but not deleted.
llvm-svn: 46677
and StoreSDNode into their common base class LSBaseSDNode. Member
functions getLoadedVT and getStoredVT are replaced with the common
getMemoryVT to simplify code that will handle both loads and stores.
llvm-svn: 46538
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
llvm-svn: 46414
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
llvm-svn: 46384
1. we already know the value is dead, so don't bother replacing
it with undef.
2. The very case the comment describes actually makes the load
live which asserts in deletenode. If we do the replacement
and the node becomes live, just treat it as new. This fixes
a failure on X86/2008-01-16-InvalidDAGCombineXform.ll with
some local changes in my tree.
llvm-svn: 46306
dead stuff around. This gets fed into the isel pass and causes certain foldings from
happening because nodes have extraneous uses floating around. For example, if we turned
foo(bar(x)) -> baz(x), we sometimes left bar(x) around.
llvm-svn: 46305
1. Legalize now always promotes truncstore of i1 to i8.
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
safe.
The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:
_foo:
fldt 20(%esp)
fldt 4(%esp)
faddp %st(1)
movl 36(%esp), %eax
fstps (%eax)
ret
instead of:
_foo:
subl $4, %esp
fldt 24(%esp)
fldt 8(%esp)
faddp %st(1)
fstps (%esp)
movl 40(%esp), %eax
movss (%esp), %xmm0
movss %xmm0, (%eax)
addl $4, %esp
ret
llvm-svn: 46140
and switch various codegen pieces and the X86 backend over
to using it.
* Add some comments to SelectionDAGNodes.h
* Introduce a second argument to FP_ROUND, which indicates
whether the FP_ROUND changes the value of its input. If
not it is safe to xform things like fp_extend(fp_round(x)) -> x.
llvm-svn: 46125
It's not safe to use the two value CombineTo variant to combine away a dead load.
e.g.
v1, chain2 = load chain1, loc
v2, chain3 = load chain2, loc
v3 = add v2, c
Now we replace use of v1 with undef, use of chain2 with chain1.
ReplaceAllUsesWith() will iterate through uses of the first load and update operands:
v1, chain2 = load chain1, loc
v2, chain3 = load chain1, loc
v3 = add v2, c
Now the second load is the same as the first load, SelectionDAG cse will ensure
the use of second load is replaced with the first load.
v1, chain2 = load chain1, loc
v3 = add v1, c
Then v1 is replaced with undef and bad things happen.
llvm-svn: 46099
_foo:
movl $12, %eax
andl 4(%esp), %eax
movl _array(%eax), %eax
ret
instead of:
_foo:
movl 4(%esp), %eax
shrl $2, %eax
andl $3, %eax
movl _array(,%eax,4), %eax
ret
As it turns out, this triggers all the time, in a wide variety of
situations, for example, I see diffs like this in various programs:
- movl 8(%eax), %eax
- shll $2, %eax
- andl $1020, %eax
- movl (%esi,%eax), %eax
+ movzbl 8(%eax), %eax
+ movl (%esi,%eax,4), %eax
- shll $2, %edx
- andl $1020, %edx
- movl (%edi,%edx), %edx
+ andl $255, %edx
+ movl (%edi,%edx,4), %edx
Unfortunately, I also see stuff like this, which can be fixed in the
X86 backend:
- andl $85, %ebx
- addl _bit_count(,%ebx,4), %ebp
+ shll $2, %ebx
+ andl $340, %ebx
+ addl _bit_count(%ebx), %ebp
llvm-svn: 44656
optimized. This avoids creating illegal divisions when the combiner is
running after legalize; this fixes PR1815. Also, it produces better
code in the included testcase by avoiding the subtract and multiply
when the division isn't optimized.
llvm-svn: 44341
transformation. Previously, it's restricted by ensuring the number of load uses
is one. Now the restriction is loosened up by allowing setcc uses to be
"extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq).
llvm-svn: 43465
of offset and the alignment of ptr if these are both powers of
2. While the ptr alignment is guaranteed to be a power of 2,
there is no reason to think that offset is. For example, if
offset is 12 (the size of a long double on x86-32 linux) and
the alignment of ptr is 8, then the alignment of ptr+offset
will in general be 4, not 8. Introduce a function MinAlign,
lifted from gcc, for computing the minimum guaranteed alignment.
I've tried to fix up everywhere under lib/CodeGen/SelectionDAG/.
I also changed some places that weren't wrong (because both values
were a power of 2), as a defensive change against people copying
and pasting the code.
Hopefully someone who cares about alignment will review the rest
of LLVM and fix up the remaining places. Since I'm on x86 I'm
not very motivated to do this myself...
llvm-svn: 43421
take a deleted nodes vector, instead of requiring it.
One more significant change: Implement the start of a legalizer that
just works on types. This legalizer is designed to run before the
operation legalizer and ensure just that the input dag is transformed
into an output dag whose operand and result types are all legal, even
if the operations on those types are not.
This design/impl has the following advantages:
1. When finished, this will *significantly* reduce the amount of code in
LegalizeDAG.cpp. It will remove all the code related to promotion and
expansion as well as splitting and scalarizing vectors.
2. The new code is very simple, idiomatic, and modular: unlike
LegalizeDAG.cpp, it has no 3000 line long functions. :)
3. The implementation is completely iterative instead of recursive, good
for hacking on large dags without blowing out your stack.
4. The implementation updates nodes in place when possible instead of
deallocating and reallocating the entire graph that points to some
mutated node.
5. The code nicely separates out handling of operations with invalid
results from operations with invalid operands, making some cases
simpler and easier to understand.
6. The new -debug-only=legalize-types option is very very handy :),
allowing you to easily understand what legalize types is doing.
This is not yet done. Until the ifdef added to SelectionDAGISel.cpp is
enabled, this does nothing. However, this code is sufficient to legalize
all of the code in 186.crafty, olden and freebench on an x86 machine. The
biggest issues are:
1. Vectors aren't implemented at all yet
2. SoftFP is a mess, I need to talk to Evan about it.
3. No lowering to libcalls is implemented yet.
4. Various operations are missing etc.
5. There are FIXME's for stuff I hax0r'd out, like softfp.
Hey, at least it is a step in the right direction :). If you'd like to help,
just enable the #ifdef in SelectionDAGISel.cpp and compile code with it. If
this explodes it will tell you what needs to be implemented. Help is
certainly appreciated.
Once this goes in, we can do three things:
1. Add a new pass of dag combine between the "type legalizer" and "operation
legalizer" passes. This will let us catch some long-standing isel issues
that we miss because operation legalization often obfuscates the dag with
target-specific nodes.
2. We can rip out all of the type legalization code from LegalizeDAG.cpp,
making it much smaller and simpler. When that happens we can then
reimplement the core functionality left in it in a much more efficient and
non-recursive way.
3. Once the whole legalizer is non-recursive, we can implement whole-function
selectiondags maybe...
llvm-svn: 42981
Check if one of the two results unneeded so see if a simpler operator
could bs used. Also check to see if each of the two computations could be
simplified if they were split into separate operators. Factor out the code
that calls visit() so that it can be used for this purpose.
llvm-svn: 42759
access to bits). Use them in place of float and
double interfaces where appropriate.
First bits of x86 long double constants handling
(untested, probably does not work).
llvm-svn: 41858
Implement some constant folding in SelectionDAG and
DAGCombiner using APFloat. Remove double versions
of constructor and getValue from ConstantFPSDNode.
llvm-svn: 41664
DAGCombiner.cpp: In member function 'llvm::SDOperand<unnamed>::DAGCombiner::visitOR(llvm::SDNode*)':
DAGCombiner.cpp:1608: warning: passing negative value '-0x00000000000000001' for argument 1 to 'llvm::SDOperand llvm::SelectionDAG::getConstant(uint64_t, llvm::MVT::ValueType, bool)'
oiy.
llvm-svn: 38458
visitFSUB to fold 0-B to -B in UnsafeFPMath mode. Also change visitFNEG
to use isNegatibleForFree/GetNegatedExpression instead of doing a subset
of the same thing manually.
This fixes test/CodeGen/X86/negative-sin.ll.
llvm-svn: 37842
extended vector types. Remove the special SDNode opcodes used for pre-legalize
vector operations, and the special MVT::Vector type used with them. Adjust
lowering and legalize to work with the normal SDNode kinds instead, and to
use the normal MVT functions to work with vector types instead of using the
two special operands that the pre-legalize nodes held.
This allows pre-legalize and post-legalize DAGs, and the code that operates
on them, to be more consistent. Pre-legalize vector operators can be handled
more consistently with scalar operators. And, -view-dag-combine1-dags and
-view-legalize-dags now look prettier for vector code.
llvm-svn: 37719
TargetLowering to SelectionDAG so that they have more convenient
access to the current DAG, in preparation for the ValueType routines
being changed from standalone functions to members of SelectionDAG for
the pre-legalize vector type changes.
llvm-svn: 37704
- (store (bitconvert v)) -> (store v) if resultant store does not require
higher alignment
- (bitconvert (load v)) -> (load (bitconvert*)v) if resultant load does not
require higher alignment
llvm-svn: 36908
single-use nodes, they will be dead soon. Make sure to remove them before
processing other nodes. This implements CodeGen/X86/shl_elim.ll
llvm-svn: 36244
2. Help DAGCombiner recognize zero/sign/any-extended versions of ROTR and ROTL
patterns. This was motivated by the X86/rotate.ll testcase, which should now
generate code for other platforms (and soon-to-come platforms.) Rewrote code
slightly to make it easier to read.
llvm-svn: 35605
addc, turn it into add.
This allows us to compile:
long long test(long long A, unsigned B) {
return (A + ((long long)B << 32)) & 123;
}
into:
_test:
movl $123, %eax
andl 4(%esp), %eax
xorl %edx, %edx
ret
instead of:
_test:
xorl %edx, %edx
movl %edx, %eax
addl 4(%esp), %eax ;; add of zero
andl $123, %eax
ret
llvm-svn: 34909
sextinreg if not needed. This is useful in two cases: before legalize,
it avoids creating a sextinreg that will be trivially removed. After legalize
if the target doesn't support sextinreg, the trunc/sext would not have been
removed before.
llvm-svn: 34621
careful when folding "c ? load p : load q" that C doesn't reach either load.
If so, folding this into load (c ? p : q) will induce a cycle in the graph.
llvm-svn: 33251
about whether the new base ptr would be live below the load/store. Let two
address pass split it back to non-indexed ops.
- Minor tweaks / fixes.
llvm-svn: 31544
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
llvm-svn: 31380
(vector_shuffle
(vbitconvert (vbuildvector (copyfromreg v4f32), 1, v4f32), 4, f32),
(undef, undef, undef, undef), (0, 0, 0, 0), 4, f32)
to the
vbitconvert
is a very bad idea.
llvm-svn: 30989
SelectionDAG and it has since bitrotted. Remove the copy from SelectionDAG.
Next, remove the constant folding piece of DAGCombiner::SimplifySetCC into
a new FoldSetCC method which can be used by getNode() and SimplifySetCC.
This fixes obscure bugs.
llvm-svn: 30952
the src/dst are not the same size. This catches things like "truncate
32-bit X to 8 bits, then zext to 16", which happens a bit on X86.
llvm-svn: 30557
in the start of an array and a count of operands where applicable. In many
cases, the number of operands is known, so this static array can be allocated
on the stack, avoiding the heap. In many other cases, a SmallVector can be
used, which has the same benefit in the common cases.
I updated a lot of code calling getNode that takes a vector, but ran out of
time. The rest of the code should be updated, and these methods should be
removed.
We should also do the same thing to eliminate the methods that take a
vector of MVT::ValueTypes.
It would be extra nice to convert the dagiselemitter to avoid creating vectors
for operands when calling getTargetNode.
llvm-svn: 29566
SimplifySelectOps would eliminate a Select, delete it, then return true.
The clients would see that it did something and return null.
The top level would see a null return, and decide that nothing happened,
proceeding to process the node in other ways: boom.
The fix is simple: clients of SimplifySelectOps should return the select
node itself.
In order to catch really obnoxious boogs like this in the future, add an
assert that nodes are not deleted. We do this by checking for a sentry node
type that the SDNode dtor sets when a node is destroyed.
llvm-svn: 28514
llvm-gcc4 boostrap. Whenever a node is deleted by the dag combiner, it
*must* be returned by the visit function, or the dag combiner will not
know that the node has been processed (and will, e.g., send it to the
target dag combine xforms).
llvm-svn: 27922
DAG combiner can turn a VAND V, <-1, 0, -1, -1>, i.e. vector clear elements,
into a vector shuffle with a zero vector. It only does so when TLI tells it
the xform is profitable.
llvm-svn: 27874
sure to build it as SHUFFLE(X, undef, mask), not SHUFFLE(X, X, mask).
The later is not canonical form, and prevents the PPC splat pattern from
matching. For a particular splat, we go from generating this:
li r10, lo16(LCPI1_0)
lis r11, ha16(LCPI1_0)
lvx v3, r11, r10
vperm v3, v2, v2, v3
to generating:
vspltw v3, v2, 3
llvm-svn: 27236
Make the PPC backend not dependent on BRTWOWAY_CC and make the branch
selector smarter about the code it generates, fixing a case in the
readme.
llvm-svn: 26814
and instruction. This allows us to compile stuff like this:
bool %X(int %X) {
%Y = add int %X, 14
%Z = setne int %Y, 12345
ret bool %Z
}
to this:
_X:
cmpl $12331, 4(%esp)
setne %al
movzbl %al, %eax
ret
instead of this:
_X:
cmpl $12331, 4(%esp)
setne %al
movzbl %al, %eax
andl $1, %eax
ret
This occurs quite a bit with the X86 backend. For example, 25 times in
lambda, 30 times in 177.mesa, 14 times in galgel, 70 times in fma3d,
25 times in vpr, several hundred times in gcc, ~45 times in crafty,
~60 times in parser, ~140 times in eon, 110 times in perlbmk, 55 on gap,
16 times on bzip2, 14 times on twolf, and 1-2 times in many other SPEC2K
programs.
llvm-svn: 25901
void bar(double Y, double *X) {
*X = Y;
}
to this:
bar:
save -96, %o6, %o6
st %i1, [%i2+4]
st %i0, [%i2]
restore %g0, %g0, %g0
retl
nop
instead of this:
bar:
save -104, %o6, %o6
st %i1, [%i6+-4]
st %i0, [%i6+-8]
ldd [%i6+-8], %f0
std %f0, [%i2]
restore %g0, %g0, %g0
retl
nop
on sparcv8.
llvm-svn: 24983
load. This reduces number of worklist iterations and avoid missing optimizations
depending on folding of things into sext_inreg nodes (which aren't supported by
all targets).
Tested by Regression/CodeGen/X86/extend.ll:test2
llvm-svn: 24712
Allow (zext (truncate)) to apply after legalize if the target supports
AND (which all do).
This compiles
short %foo() {
%tmp.0 = load ubyte* %X ; <ubyte> [#uses=1]
%tmp.3 = cast ubyte %tmp.0 to short ; <short> [#uses=1]
ret short %tmp.3
}
to:
_foo:
movzbl _X, %eax
ret
instead of:
_foo:
movzbl _X, %eax
movzbl %al, %eax
ret
thanks to Evan for pointing this out.
llvm-svn: 24709
the input is that type, this caused a failure on gs on X86 last night.
Move the hard checks into Build[US]Div since that is where decisions like
this should be made.
llvm-svn: 23881
Add a new flag to TargetLowering indicating if the target has really cheap
signed division by powers of two, make ppc use it. This will probably go
away in the future.
Implement some more ISD::SDIV folds in the dag combiner
Remove now dead code in the x86 backend.
llvm-svn: 23853
a lot throughout many programs. In particular, specfp triggers it a bunch for
constant FP nodes when you have code like cond ? 1.0 : -1.0.
If the PPC ISel exposed the loads implicit in pic references to external globals,
we would be able to eliminate a load in cases like this as well:
%X = external global int
%Y = external global int
int* %test4(bool %C) {
%G = select bool %C, int* %X, int* %Y
ret int* %G
}
Note that this breaks things that use SrcValue's (see the fixme), but since nothing
uses them yet, this is ok.
Also, simplify some code to use hasOneUse() on an SDOperand instead of hasNUsesOfValue directly.
llvm-svn: 23781
you could be AND'ing with the result of a shift that shifts out all the
bits you care about, in addition to a constant.
Also, move over an add/sub_parts fold from legalize to the dag combiner,
where it works for things other than constants. Woot!
llvm-svn: 23720
out, where after the first CombineTo() call, the node the second CombineTo
wishes to replace may no longer exist.
Fix a very real bug with the truncated load optimization on little endian
targets, which do not need a byte offset added to the load.
llvm-svn: 23704
like turning:
_foo:
fctiwz f0, f1
stfd f0, -8(r1)
lwz r2, -4(r1)
rlwinm r3, r2, 0, 16, 31
blr
into
_foo:
fctiwz f0,f1
stfd f0,-8(r1)
lhz r3,-2(r1)
blr
Also removed an unncessary constraint from sra -> srl conversion, which
should take care of hte only reason we would ever need to handle sra in
MaskedValueIsZero, AFAIK.
llvm-svn: 23703
location, replace them with a new store of the last value. This occurs
in the same neighborhood in 197.parser, speeding it up about 1.5%
llvm-svn: 23691
multiple results.
Use this support to implement trivial store->load forwarding, implementing
CodeGen/PowerPC/store-load-fwd.ll. Though this is the most simple case and
can be extended in the future, it is still useful. For example, it speeds
up 197.parser by 6.2% by avoiding an LSU reject in xalloc:
stw r6, lo16(l5_end_of_array)(r2)
addi r2, r5, -4
stwx r5, r4, r2
- lwzx r5, r4, r2
- rlwinm r5, r5, 0, 0, 30
stwx r5, r4, r2
lwz r2, -4(r4)
ori r2, r2, 1
llvm-svn: 23690