Commit Graph

2273 Commits

Author SHA1 Message Date
Sanjay Patel 0f9dcf8b90 move DAGCombiner's allowableAlignment() helper function into the TLI
Making allowableAlignment() more accessible was suggested as a predecessor patch
for D10662, so I've pulled it into TargetLowering. This let's us remove 4 instances
of duplicate logic in LegalizeDAG.

There's a subtle functional change in the implementation: the existing 
allowableAlignment() code was using getPrefTypeAlignment() when checking 
alignment with the DataLayout and assumed that was fast. In this implementation,
we use getABITypeAlignment() and assume that is fast. See the TODO comment or the
discussion in the Phab review for future improvements in this implementation
(don't use the data layout at all).

There are no regression test changes from this difference, and I'm not sure how to
expose it via a test. I think we actually do want to provide the 'Fast' param when
checking this from DAGCombiner::MergeConsecutiveStores(). Ie, we shouldn't merge 
stores if the new stores are not going to be fast. But that change will require 
fixing allowsMisalignedMemoryAccess() overrides as noted in D10662.

Differential Revision: http://reviews.llvm.org/D10905

llvm-svn: 243549
2015-07-29 18:24:18 +00:00
Sanjay Patel 133e68b45c ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141)
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.

After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.

The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.

Differential Revision: http://reviews.llvm.org/D11345

llvm-svn: 243500
2015-07-28 23:28:22 +00:00
Sanjay Patel 1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Sanjay Patel c1c2b87001 move combineRepeatedFPDivisors logic into a helper function; NFCI
llvm-svn: 243293
2015-07-27 17:58:49 +00:00
Simon Pilgrim 4ef0576c40 [DAGCombiner] Fixed minor typo that was missed in D9097.
We don't bitcast the UNDEFs - that is done in visitVECTOR_SHUFFLE, and the getValueType should come from the operand's SDValue not the SDNode.

llvm-svn: 242640
2015-07-19 11:31:40 +00:00
Simon Pilgrim 3aca32ea4a Use SDValue bool check. NFCI.
llvm-svn: 242636
2015-07-19 09:56:36 +00:00
Matt Arsenault cabe02e141 Only do fmul (fadd x, x), c combine if the fadd only has one use
This was increasing the instruction count if the fadd has multiple uses.

llvm-svn: 242498
2015-07-17 01:14:35 +00:00
Pete Cooper 7e64ef06e6 Use more foreach loops in SelectionDAG. NFC
llvm-svn: 242249
2015-07-14 23:43:29 +00:00
Matt Arsenault f54dc2384d DAGCombiner: Assume invariant load cannot alias a store
The motivation is to allow GatherAllAliases / FindBetterChain
to not give up on dependent loads of a pointer from constant memory.

This is important for AMDGPU, because most loads are pointers
derived from a load of a kernel argument from constant memory.

llvm-svn: 241948
2015-07-10 22:17:40 +00:00
Sanjay Patel e2361d4a18 fix an invisible bug when combining repeated FP divisors
This patch fixes bugs that were exposed by the addition of fast-math-flags in the DAG:
r237046 ( http://reviews.llvm.org/rL237046 ):

1. When replacing a division node, it's not enough to RAUW.
   We should call CombineTo() to delete dead nodes and combine again.
2. Because we are changing the DAG, we can't return an empty SDValue
   after the transform. As the code comments say:

    Visitation implementation - Implement dag node combining for different node types.
    The semantics are as follows: Return Value:
      SDValue.getNode() == 0 - No change was made
      SDValue.getNode() == N - N was replaced, is dead and has been handled.
      otherwise - N should be replaced by the returned Operand.

The new test case shows no difference with or without this patch, but it will crash if
we re-apply r237046 or enable FMF via the current -enable-fmf-dag cl::opt.

Differential Revision: http://reviews.llvm.org/D9893

llvm-svn: 241826
2015-07-09 17:28:37 +00:00
Mehdi Amini eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Sanjay Patel c1afa95a51 early exits -> less indenting; NFCI
llvm-svn: 241716
2015-07-08 19:32:39 +00:00
Mehdi Amini ffc1402fad Remove IsLittleEndian from TargetLowering and redirect to DataLayout
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11017

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241655
2015-07-08 01:00:38 +00:00
Mehdi Amini 8ac7a9d57a Redirect DataLayout from TargetMachine to Module in SelectionDAG
Summary:
SelectionDAG itself is not invoking directly the DataLayout in the
TargetMachine, but the "TargetLowering" class is still using it. I'll
address it in a following commit.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11000

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241618
2015-07-07 19:07:19 +00:00
Pawel Bylica c52eabb285 Reapply r240291: Fix shl folding in DAG combiner.
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.

llvm-svn: 241254
2015-07-02 11:44:54 +00:00
Pawel Bylica 143ceb6d46 [DAGCombiner] Fix & simplify constant folding of sext/zext.
Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$

Test Plan: New regression test included.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10607

llvm-svn: 240991
2015-06-29 20:28:47 +00:00
Benjamin Kramer 5b455f0b62 [SDAG] Now that we have a way to communicate the exact bit on sdiv use it to simplify sdiv by a constant.
We had a hack in SDAGBuilder in place to work around this but now we
can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can
perform this trick automatically.

The added check in DAGCombiner is necessary to prevent exact sdiv by pow2
from regressing as the target-specific pow2 lowering is not aware of
exact bits yet.

This is mostly covered by existing tests. One side effect is that we
get the better lowering for exact vector sdivs now too :)

llvm-svn: 240891
2015-06-27 20:33:26 +00:00
Pete Cooper 8c0a710995 Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240809
2015-06-26 18:41:54 +00:00
Benjamin Kramer 07e70b4fa4 [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)
Instcombine also does this but many opportunities only become visible
after GEPs are lowered.

llvm-svn: 240787
2015-06-26 14:51:36 +00:00
Matt Arsenault f735cab986 DAGCombiner: Use pop_back_val()
llvm-svn: 240709
2015-06-25 22:15:05 +00:00
Matt Arsenault c244dcb804 DAGCombiner: Remove redundant check
MemIntrinsicSDNode is already a subclass of MemSDNode,
so the MemSDNode check is sufficient.

llvm-svn: 240672
2015-06-25 18:47:02 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Pawel Bylica e6fd8c4232 Revert r240291: causes problems in self-hosted builds.
llvm-svn: 240343
2015-06-22 21:54:07 +00:00
Pawel Bylica 06407c0320 Fix shl folding in DAG combiner.
Summary: The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

Test Plan: A regression test included.

Reviewers: andreadb

Reviewed By: andreadb

Subscribers: andreadb, test, llvm-commits

Differential Revision: http://reviews.llvm.org/D10602

llvm-svn: 240291
2015-06-22 15:58:11 +00:00
Chandler Carruth c3f49eb451 [PM/AA] Hoist the AliasResult enum out of the AliasAnalysis class.
This will allow classes to implement the AA interface without deriving
from the class or referencing an internal enum of some other class as
their return types.

Also, to a pretty fundamental extent, concepts such as 'NoAlias',
'MayAlias', and 'MustAlias' are first class concepts in LLVM and we
aren't saving anything by scoping them heavily.

My mild preference would have been to use a scoped enum, but that
feature is essentially completely broken AFAICT. I'm extremely
disappointed. For example, we cannot through any reasonable[1] means
construct an enum class (or analog) which has scoped names but converts
to a boolean in order to test for the possibility of aliasing.

[1]: Richard Smith came up with a "solution", but it requires class
templates, and lots of boilerplate setting up the enumeration multiple
times. Something like Boost.PP could potentially bundle this up, but
even that would be quite painful and it doesn't seem realistically worth
it. The enum class solution would probably work without the need for
a bool conversion.

Differential Revision: http://reviews.llvm.org/D10495

llvm-svn: 240255
2015-06-22 02:16:51 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Chandler Carruth ac80dc7532 [PM/AA] Remove the Location typedef from the AliasAnalysis class now
that it is its own entity in the form of MemoryLocation, and update all
the callers.

This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.

llvm-svn: 239885
2015-06-17 07:18:54 +00:00
Sanjay Patel 0fcc53f6d6 rename variables; NFC
...because I see 'StoreBW' and read it as 'store bandwidth'

llvm-svn: 239850
2015-06-16 20:47:19 +00:00
Sanjay Patel bb385ed454 extract some code into a helper function for MergeConsecutiveStores(); NFCI
llvm-svn: 239847
2015-06-16 20:05:00 +00:00
Sanjay Patel f134048b1d propagate IR-level fast-math-flags to DAG nodes, disabled by default
This is an updated version of the patch that was checked in at:
http://reviews.llvm.org/rL237046

but subsequently reverted because it exposed a bug in the DAG Combiner:
http://reviews.llvm.org/D9893

This time, there's an enablement flag ("EnableFMFInDAG") around the code in
SelectionDAGBuilder where we copy the set of FP optimization flags from IR
instructions to DAG nodes. So, in theory, there should be no functional change
from this patch as-is, but it will allow testing with the added functionality
to proceed via "-enable-fmf-dag" passed to llc.

This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

Differential Revision: http://reviews.llvm.org/D10403

llvm-svn: 239828
2015-06-16 16:25:43 +00:00
Matt Arsenault ed891b5561 Revert "Revert "Fix merges of non-zero vector stores""
Reapply r239539. Don't assume the collected number of
stores is the same vector size. Just take the first N
stores to fill the vector.

llvm-svn: 239825
2015-06-16 15:51:48 +00:00
Simon Pilgrim d3f6427446 [DAGCombiner] Added BSWAP(BSWAP(x)) -> x combine pattern.
llvm-svn: 239682
2015-06-13 16:25:12 +00:00
Simon Pilgrim 011381d48b [DAGCombiner] Added BSWAP vector constant folding support.
llvm-svn: 239675
2015-06-13 14:08:15 +00:00
Simon Pilgrim 096cccd01a Stripped trailing whitespace. NFC.
llvm-svn: 239674
2015-06-13 12:57:36 +00:00
Reid Kleckner 2691c59e97 Revert "Fix merges of non-zero vector stores"
This reverts commit r239539.

It was causing SDAG assertions while building freetype.

llvm-svn: 239543
2015-06-11 17:25:24 +00:00
Matt Arsenault e23a063dc3 Fix merges of non-zero vector stores
Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.

The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.

llvm-svn: 239539
2015-06-11 16:03:52 +00:00
Simon Pilgrim 4791f6d89b [DAGCombiner] Added CTLZ vector constant folding support.
llvm-svn: 239305
2015-06-08 16:19:00 +00:00
Simon Pilgrim c789e1d57b [DAGCombiner] Added CTTZ vector constant folding support.
llvm-svn: 239293
2015-06-08 09:57:09 +00:00
Simon Pilgrim 68cd237f57 [DAGCombiner] Added CTPOP vector constant folding support.
Added tests to the existing SSE/AVX test files.

llvm-svn: 239252
2015-06-07 15:37:14 +00:00
Fiona Glaser 666e352440 DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is free
For targets with a free fneg, this fold is always a net loss if it
ends up duplicating the multiply, so definitely avoid it.

This might be true for some targets without a free fneg too, but
I'll leave that for future investigation.

llvm-svn: 239167
2015-06-05 17:52:34 +00:00
Andrea Di Biagio eb33134ce7 Simplify code; NFC.
Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into
CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using
update_llc_test_checks.py.

llvm-svn: 239142
2015-06-05 10:29:55 +00:00
Andrea Di Biagio 9ac8a6b13d [DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.

However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.

Example:

define <2 x double> @example(<2 x double> %A) {
entry:
  %0 = extractelement <2 x double> %A, i32 0
  %1 = insertelement <2 x double> undef, double %0, i32 0
  %2 = insertelement <2 x double> %1, double -0.0, i32 1
  ret <2 x double> %2
}

Before this patch, llc (with -mattr=+sse4.1) wrongly generated
  movq   %xmm0, %xmm0  # xmm0 = xmm0[0],zero

So, the sign bit of the negative zero was effectively lost.

This patch fixes the problem by adding explicit checks for positive zero.

With this patch, llc produces the following code for the example above:
  movhpd .LCPI0_0(%rip), %xmm0

where .LCPI0_0 referes to a 'double -0'.

llvm-svn: 239070
2015-06-04 19:15:01 +00:00
Matt Arsenault ca519dc28b Pass address space to isLegalAddressingMode in DAGCombiner
No test because I don't know of a target that makes use
of address spaces and indexed load / store.

llvm-svn: 239051
2015-06-04 16:17:34 +00:00
Matt Arsenault 65ad1602b0 Add target hook to allow merging stores of nonzero constants
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.

Most of the new testcases aren't optimally merged, and are for
later improvements.

llvm-svn: 238108
2015-05-24 00:51:27 +00:00
Simon Pilgrim e054199354 [X86][SSE] Improve support for 128-bit vector sign extension
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).

It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.

Differential Revision: http://reviews.llvm.org/D9848

llvm-svn: 237885
2015-05-21 10:05:03 +00:00
Matthias Braun 56a781495a DAGCombiner: Continue combining if FoldConstantArithmetic() fails.
DAG.FoldConstantArithmetic() can fail even though both operands are
Constants if OpaqueConstants are involved. Continue trying other combine
possibilities in tis case.

Differential Revision: http://reviews.llvm.org/D6946

Somewhat related to PR21801 / rdar://19211454

llvm-svn: 237822
2015-05-20 18:54:02 +00:00
Sanjay Patel 03abbb48a4 use 'auto *' for pointers; clearer usage, no deep copying
llvm-svn: 237719
2015-05-19 20:10:16 +00:00
Sanjay Patel ad11415962 tidy up
1. remove duplicate local variable
2. add local variable with name to match comment
3. remove useless comment

llvm-svn: 237715
2015-05-19 19:10:57 +00:00
Sanjay Patel 64a6da947a use range-based for-loop
llvm-svn: 237711
2015-05-19 18:24:33 +00:00
Sanjay Patel 3c9e370ec0 use range-based for loop
llvm-svn: 237705
2015-05-19 17:49:14 +00:00
Matthias Braun 887fdfb759 DAGCombiner: Factor common pattern into isOneConstant() function. NFC
llvm-svn: 237645
2015-05-19 00:25:21 +00:00
Matthias Braun 033121981d DAGCombiner: Factor common pattern into isAllOnesConstant() function. NFC
llvm-svn: 237644
2015-05-19 00:25:20 +00:00
Matthias Braun 0542b5d1db DAGCombiner: Use isNullConstant() where possible
llvm-svn: 237643
2015-05-19 00:25:17 +00:00
Matthias Braun c545234772 Revert accidental change in r237633
llvm-svn: 237635
2015-05-18 23:18:13 +00:00
Matthias Braun 1505efb0bb DAGCombiner: Factor common pattern into isNullConstant() function. NFC
llvm-svn: 237633
2015-05-18 23:07:27 +00:00
Hal Finkel a60e633fdd [DAGCombine] Be more pedantic about use iteration in CombineToPreIndexedLoadStore
In CombineToPreIndexedLoadStore, when the offset is a constant, we have code
that looks for other uses of the pointer which are constant offset computations
so that they can be rewritten in terms of the updated pointer so that we don't
need to keep a copy of the base pointer to compute these constant offsets.

Unfortunately, when it iterated over the uses, it did so by SDNodes, and so we
could confuse ourselves if the base pointer was produced by a node that had
multiple results (because we would not immediately exclude uses of the other
node results). This was reported as PR22755. Unfortunately, we don't have a
test case (and I've also been unable to produce one thus far), but at least the
mistake is clear. The right way to fix this problem is to make use of the information
contained in the use iterators to filter out any uses of other results of the
node producing the base pointer.

This should be mostly NFC, but should also fix PR22755 (for which,
unfortunately, we have no in-tree test case).

llvm-svn: 237576
2015-05-18 15:46:02 +00:00
Nick Lewycky 37a175007b Revert r237046. See the testcase on the thread where r237046 was committed.
llvm-svn: 237317
2015-05-13 23:41:47 +00:00
Sanjay Patel 5b202966f5 propagate IR-level fast-math-flags to DAG nodes; 2nd try; NFC
This is a less ambitious version of:
http://reviews.llvm.org/rL236546

because that was reverted in:
http://reviews.llvm.org/rL236600

because it caused memory corruption that wasn't related to FMF
but was actually due to making nodes with 2 operands derive from a
plain SDNode rather than a BinarySDNode. 

This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
 

llvm-svn: 237046
2015-05-11 21:07:09 +00:00
James Y Knight fca02be3c1 Fix MergeConsecutiveStore for non-byte-sized memory accesses.
The bug showed up as a compile-time assertion failure:
  Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.

Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.

It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.

Differential Revision: http://reviews.llvm.org/D9626

llvm-svn: 236927
2015-05-09 03:13:37 +00:00
James Y Knight 284e7b3d6c Fix alignment checks in MergeConsecutiveStores.
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.

Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.

2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.

llvm-svn: 236850
2015-05-08 13:47:01 +00:00
NAKAMURA Takumi d7c0be9c42 Revert r236546, "propagate IR-level fast-math-flags to DAG nodes (NFC)"
It caused undefined behavior.

llvm-svn: 236600
2015-05-06 14:03:12 +00:00
Sanjay Patel 801caff64d propagate IR-level fast-math-flags to DAG nodes (NFC)
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.

There are 2 structural changes here:

1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics 
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.

2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes. 

Differential Revision: http://reviews.llvm.org/D8900

llvm-svn: 236546
2015-05-05 21:40:38 +00:00
Ulrich Weigand 9958c489bb [DAGCombiner] Account for getVectorIdxTy() when narrowing vector load
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it.  This is needed on z, where element numbers are i32
but pointers are i64.

Original patch by Richard Sandiford.

llvm-svn: 236530
2015-05-05 19:34:10 +00:00
Ulrich Weigand af2c618e2b [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt).  For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)

Original patch by Richard Sandiford.

llvm-svn: 236529
2015-05-05 19:33:37 +00:00
Simon Pilgrim 017ca19384 [DAGCombiner] Enabled vector float/double -> int constant folding
llvm-svn: 236387
2015-05-02 13:04:07 +00:00
Elena Demikhovsky e1eda8a9e6 Masked gather and scatter - added DAGCombine visitors
and AVX-512 instruction selection patterns.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 236211
2015-04-30 08:38:48 +00:00
Owen Anderson d8a029c81b Semantically revert r236031, which is not a good idea for in-order targets.
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).

Sanjay agreed with reverting this patch until these issues can be
resolved.

llvm-svn: 236199
2015-04-30 04:06:32 +00:00
Sanjay Patel 04b0e92766 generalize binop reassociation; NFC
Move the fold introduced in r236031:
http://reviews.llvm.org/rL236031

to its own helper function, so we can use it for other binops.

This is a preliminary step before partially solving:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

llvm-svn: 236171
2015-04-29 22:30:02 +00:00
Sanjay Patel caf5180ff7 tidy up; NFC
llvm-svn: 236156
2015-04-29 21:01:41 +00:00
Sanjay Patel ee6678119d too much space again; NFC
llvm-svn: 236150
2015-04-29 20:38:02 +00:00
Sanjay Patel 435efaadff too much space; NFC
llvm-svn: 236147
2015-04-29 20:32:57 +00:00
Sanjay Patel 2fbc4e5c49 transform fadd chains to increase parallelism
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
The optimal balanced binary tree would reduce the chain to log2(N).

The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.

We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Differential Revision: http://reviews.llvm.org/D9232

llvm-svn: 236031
2015-04-28 21:03:22 +00:00
Sanjay Patel ba55804ea3 move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325

llvm-svn: 235997
2015-04-28 16:39:12 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Quentin Colombet 8229145961 [DAGCombiner] Fix the type used in canFoldInAddressingMode to account for the
right scaling.

In the function canFoldInAddressingMode, VT is computed as the type of the
destination/source of a LOAD/STORE operations, instead of the memory type of the
operation.
On targets with a scaling factor on the offset of the LOAD/STORE operations, the
function may return false for actually valid cases. This may then prevent the
selection of profitable pre or post indexed load/store operations, and instead
select pre or post indexed load/store for unprofitable cases.

Patch by Francois de Ferriere <francois.de-ferriere@st.com>!

Differential Revision: http://reviews.llvm.org/D9146

llvm-svn: 235780
2015-04-24 21:28:00 +00:00
Simon Pilgrim 86b034bae9 [DAGCombiner] Remove extra bitcasts surrounding vector shuffles
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)

Differential Revision: http://reviews.llvm.org/D9097

llvm-svn: 235578
2015-04-23 08:43:13 +00:00
Olivier Sallenave c587bee405 Fixed logic to enable complex FMA formation.
llvm-svn: 235508
2015-04-22 14:07:26 +00:00
Hal Finkel 0d49cf2645 [DAGCombine] Disable select(c, load,load) for indexed loads
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.

llvm-svn: 235497
2015-04-22 11:32:25 +00:00
Olivier Sallenave b99c2eb0f0 Refactoring and enhancement to FMA combine.
llvm-svn: 235344
2015-04-20 20:29:40 +00:00
Tom Stellard 69a7b91e95 DAGCombine: Remove redundant NaN checks around ISD::FSQRT
This folds:

(select (setcc x, -0.0, *lt), NaN, (fsqrt x)) -> ( fsqrt x)

llvm-svn: 235333
2015-04-20 19:38:27 +00:00
Pirama Arumuga Nainar db7c07e2bf Add support to promote f16 to f32
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.

Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll

Reviewers: srhines, t.p.northover

Differential Revision: http://reviews.llvm.org/D8755

llvm-svn: 235215
2015-04-17 18:36:25 +00:00
Ahmed Bougacha c984b90c86 [CodeGen] Re-apply r234809 (concat of scalars), with an x86_mmx fix.
The only type that isn't an integer, isn't floating point, and isn't
a vector; ladies and gentlemen, the gift that keeps on giving: x86_mmx!

Fixes PR23246.

Original message (reverted in r235062):
[CodeGen] Combine concat_vectors of scalars into build_vector.

Combine something like:
  (v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
  (v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

If any of the scalars are floating point, use that throughout.

Differential Revision: http://reviews.llvm.org/D8948

llvm-svn: 235072
2015-04-16 02:39:14 +00:00
Nick Lewycky b8557a972f Revert r234809 because it caused PR23246.
llvm-svn: 235062
2015-04-16 00:56:20 +00:00
Ahmed Bougacha 8ebcdb3bc3 [CodeGen] Combine concat_vectors of scalars into build_vector.
Combine something like:
  (v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
  (v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

If any of the scalars are floating point, use that throughout.

Differential Revision: http://reviews.llvm.org/D8948

llvm-svn: 234809
2015-04-13 22:57:21 +00:00
Matthias Braun a283cb3265 DAGCombiner: Fix crash in select(select) opt.
In case of different types used for the condition of the selects the
select(select) -> select(and) normalisation cannot be performed.

See also: http://reviews.llvm.org/D7622

llvm-svn: 234763
2015-04-13 17:16:33 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
Ahmed Bougacha df43737782 [CodeGen] Combine concat_vector of trunc'd scalar to scalar_to_vector.
We already do:
  concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
  concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.

Differential Revision: http://reviews.llvm.org/D8883

llvm-svn: 234530
2015-04-09 20:04:47 +00:00
Rafael Espindola 1c84271694 Revert "Refactoring and enhancement to FMA combine."
This reverts commit r234513. It was failing on the bots.

llvm-svn: 234518
2015-04-09 18:29:32 +00:00
Olivier Sallenave 53703d0862 Refactoring and enhancement to FMA combine.
llvm-svn: 234513
2015-04-09 17:55:26 +00:00
Akira Hatanaka c6fab80536 [DAGCombine] Fix a bug in MergeConsecutiveStores.
The bug manifests when there are two loads and two stores chained as follows in
a DAG,

(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)

and the stores' values are extracted from the preceding vector loads.

MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.

This commits fixes the bug by replacing the last store in the chain instead.

rdar://problem/20275084

Differential Revision: http://reviews.llvm.org/D8849

llvm-svn: 234430
2015-04-08 20:34:53 +00:00
Simon Pilgrim 07e063e44c [DAGCombiner] Add support for FCEIL, FFLOOR and FTRUNC vector constant folding
Differential Revision: http://reviews.llvm.org/D8715

llvm-svn: 234179
2015-04-06 17:15:41 +00:00
Simon Pilgrim bcf3bc2757 [DAGCombiner] Merge FMUL Scalar and Vector constant canonicalization to RHS. NFCI.
llvm-svn: 234118
2015-04-05 14:30:37 +00:00
Simon Pilgrim 20b7aba04a [DAGCombiner] Canonicalize vector constants for ADD/MUL/AND/OR/XOR re-association
Scalar integers are commuted to move constants to the RHS for re-association - this ensures vectors do the same.

llvm-svn: 234092
2015-04-04 10:20:31 +00:00
Simon Pilgrim ed2ba33ba0 [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

Differential Revision: http://reviews.llvm.org/D8516

llvm-svn: 234004
2015-04-03 10:02:21 +00:00
Jiangning Liu b0f076910b Fix PR23065. Avoid optimizing bitcast of build_vector with constant input to scalar_to_vector.
llvm-svn: 233778
2015-04-01 01:52:38 +00:00
Sanjay Patel d399d94837 typos; NFC
llvm-svn: 233701
2015-03-31 16:17:51 +00:00
Simon Pilgrim dcbe1213c8 Use SDValue bool check to tidyup some possible vector folding ops. NFC.
llvm-svn: 233498
2015-03-29 19:13:40 +00:00
Simon Pilgrim d15c2805ab Use SDValue bool check to tidyup some possible ReassociateOps. NFC.
llvm-svn: 233495
2015-03-29 16:49:51 +00:00
Simon Pilgrim 7fdcc30e93 [DAGCombiner] Fixed incorrect test for buildvector of constant integers.
DAGCombiner::ReassociateOps was correctly testing for an constant integer scalar but failed to correctly test for constant integer vectors (it was testing for any constant vector).

llvm-svn: 233482
2015-03-28 18:31:31 +00:00
Sanjay Patel 5b305d2d66 revert inadvertent change
llvm-svn: 233294
2015-03-26 17:19:24 +00:00
Sanjay Patel 4fa4a886d7 comment cleanup; NFC
llvm-svn: 233293
2015-03-26 17:18:17 +00:00
Sanjay Patel d95dd9e5fb fix indent; NFC
llvm-svn: 233288
2015-03-26 16:55:17 +00:00
Simon Pilgrim 09f3ff9a0a [DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.

It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.

Differential Revision: http://reviews.llvm.org/D8593

llvm-svn: 233224
2015-03-25 22:30:31 +00:00
Paul Robinson 284f0451cf 'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.

llvm-svn: 233153
2015-03-25 00:10:24 +00:00
Benjamin Kramer 51f6096cf8 Move private classes into anonymous namespaces
NFC.

llvm-svn: 232944
2015-03-23 12:30:58 +00:00
Owen Anderson db4201235b Fix a nasty bug in DAGCombine of STORE nodes.
This is very related to the bug fixed in r174431.  The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores.  When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.

I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.

llvm-svn: 232780
2015-03-19 22:48:57 +00:00
David Majnemer e48237df95 DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x)
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x).  This saves an instruction on
architectures like X86 and POWER(64).

Differential Revision: http://reviews.llvm.org/D8350

llvm-svn: 232572
2015-03-18 00:03:36 +00:00
Simon Pilgrim 257849f11a XformToShuffleWithZero - Added clearer early outs and general tidy up. NFCI
llvm-svn: 232557
2015-03-17 22:19:08 +00:00
Simon Pilgrim 8c58c066b7 [DAGCombiner] Add a shuffle mask commutation helper function. NFCI.
We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code.

This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it.

Differential Revision: http://reviews.llvm.org/D8139

llvm-svn: 231581
2015-03-07 22:33:11 +00:00
Simon Pilgrim 2dcbe74dfd Use SDValue bool check to tidyup some possible combines. NFC.
llvm-svn: 231569
2015-03-07 16:34:55 +00:00
Andrea Di Biagio c9d79e8103 [DAGCombiner] Fix wrong folding of AND dag nodes.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
2015-03-07 12:24:55 +00:00
Simon Pilgrim bede80a440 [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.

This prevents many cases of spilling scalar data between the gpr + simd registers. 

At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).

Differential Revision: http://reviews.llvm.org/D8132

llvm-svn: 231554
2015-03-07 05:52:42 +00:00
Matthias Braun 898d11e864 DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))

Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.

Differential Revision: http://reviews.llvm.org/D7622

llvm-svn: 231507
2015-03-06 19:49:10 +00:00
Matthias Braun 3ecb557739 DAGCombiner: Factor out some and/or combines.
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.

The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.

Differential Revision: http://reviews.llvm.org/D8026

llvm-svn: 231506
2015-03-06 19:49:06 +00:00
Simon Pilgrim 7189084bef [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

llvm-svn: 231380
2015-03-05 17:14:04 +00:00
Michael Kuperstein fb95697c88 [DAGCombine] Fix a bug in a BUILD_VECTOR combine
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. 
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.

Differential Revision: http://reviews.llvm.org/D8040

llvm-svn: 231219
2015-03-04 07:27:39 +00:00
David Blaikie 49cfb81665 DAGCombiner::LoadedSlice: Remove explicit copy ctor in favor of the Rule of Zero
This way, the copy assignment operator can be used without hitting the
deprecated case in C++11.

llvm-svn: 231144
2015-03-03 21:50:47 +00:00
David Blaikie 7f1e0565b3 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie bb8da4c08f Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
Sanjay Patel b8c907e2a7 avoid infinite looping when folding vector multiplies of constants (PR22698)
We were missing a check for the following fold in DAGCombiner:

// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))

If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.

This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698

Differential Revision: http://reviews.llvm.org/D7917

llvm-svn: 230884
2015-03-01 00:09:35 +00:00
Benjamin Kramer 5fbfe2ffdc Convert push_back loops into append calls.
No functionality change intended.

llvm-svn: 230849
2015-02-28 13:20:15 +00:00
Paul Robinson 093d6e1a70 When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.

Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.

Differential Revision: http://reviews.llvm.org/D7181

llvm-svn: 230659
2015-02-26 18:47:57 +00:00
Simon Pilgrim d8820ae70c Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed.
llvm-svn: 230388
2015-02-24 22:08:56 +00:00
Eric Christopher af48495130 Revert:
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Feb 23 23:04:28 2015 +0000

    Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.

and

Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sun Feb 22 18:17:28 2015 +0000

    [DagCombiner] Generalized BuildVector Vector Concatenation

    The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

    This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

    This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

    Differential Revision: http://reviews.llvm.org/D7816

as the root cause of PR22678 which is causing an assertion inside the DAG combiner.

I'll follow up to the main thread as well.

llvm-svn: 230358
2015-02-24 19:11:00 +00:00
Matthias Braun 00a4076e94 DAGCombiner: Move variable definitions closer to use; NFC
llvm-svn: 230354
2015-02-24 18:52:01 +00:00
Matthias Braun a8558ca2ed DAGCombiner: Move variable declaration closer to definiion; NFC
llvm-svn: 230353
2015-02-24 18:51:59 +00:00
Simon Pilgrim 662c1d2770 Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
llvm-svn: 230278
2015-02-23 23:04:28 +00:00
Simon Pilgrim 4e30d9b6d8 [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

llvm-svn: 230177
2015-02-22 18:17:28 +00:00
Hal Finkel e2dd84e42f [DAGCombine] Don't assume integer-type legailty in reduceBuildVecConvertToConvertBuildVec
DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from
[US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a
vector operation. We check operation legality of the conversion, but fail to
check legality of the integer vector type itself. Because targets don't
normally override operation legality defaults for illegal types, we need to
check this also.

This came up in the context of the QPX vector entensions for PowerPC (which can
have legal floating-point vector types without corresponding legal integer
vector types). No in-tree test case for this yes, but one can be added once
the QPX support has been committed.

llvm-svn: 230176
2015-02-22 16:10:22 +00:00
Matt Arsenault 0dc54c4dee Add generic fmad DAG node.
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.

This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.

llvm-svn: 230070
2015-02-20 22:10:33 +00:00
Ahmed Bougacha 4c2b0781a5 [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Sanjay Patel ab7e86e5be Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Benjamin Kramer 6cd780ff21 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Mehdi Amini 3e0023b8f6 SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Chandler Carruth 499d7332c5 [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

llvm-svn: 229285
2015-02-15 07:01:10 +00:00
Duncan P. N. Exon Smith 70eb9c5ae5 CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Benjamin Kramer 5f6a907288 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Ahmed Bougacha 24433a7005 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
Jonas Paulsson bf8d0cc699 Fix SelectionDAG compile time issue with alias analysis.
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.

Reviewed by Hal Finkel.

llvm-svn: 228841
2015-02-11 16:10:31 +00:00
Jonas Paulsson a25a3f4fea Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp.
llvm-svn: 228700
2015-02-10 15:34:29 +00:00
Chandler Carruth b65d61a2e8 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

llvm-svn: 228656
2015-02-10 02:25:56 +00:00
Ahmed Bougacha e892d13d90 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Quentin Colombet 308b171318 Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).

llvm-svn: 227272
2015-01-27 23:58:01 +00:00
Sanjay Patel bcf62f2fa2 Merge vector stores into wider vector stores (PR21711)
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector. 

Instead of producing this:

vmovaps %xmm0, (%rdi)
vextractf128    $1, %ymm0, 16(%rdi)

This patch merges the 128-bit stores into a single 256-bit store:

vmovups %ymm0, (%rdi)

Differential Revision: http://reviews.llvm.org/D7208

llvm-svn: 227242
2015-01-27 20:50:27 +00:00
Sanjay Patel 37c41c1d2c merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 

llvm-svn: 226845
2015-01-22 18:21:26 +00:00
Michael Kuperstein 25e34d11f3 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

llvm-svn: 226816
2015-01-22 13:07:28 +00:00
Michael Kuperstein ff74032018 Revert r226811, MSVC accepts code sane compilers don't.
llvm-svn: 226814
2015-01-22 12:48:07 +00:00
Michael Kuperstein 84fad3e5c9 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

llvm-svn: 226811
2015-01-22 12:37:23 +00:00
Elena Demikhovsky 150d9f3187 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.

llvm-svn: 226808
2015-01-22 12:07:59 +00:00
Elena Demikhovsky 94cfbbab33 Fixed a comment
llvm-svn: 226806
2015-01-22 10:01:36 +00:00
Elena Demikhovsky 9c26462a27 Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)

llvm-svn: 226805
2015-01-22 09:39:08 +00:00
Tim Northover 3007ba0ab3 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

llvm-svn: 226740
2015-01-21 23:17:19 +00:00
Tim Northover cf3d80fedb Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

llvm-svn: 226665
2015-01-21 15:48:52 +00:00
Tim Northover 85cd2791c9 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
llvm-svn: 226663
2015-01-21 15:43:28 +00:00
Mehdi Amini 37f316afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Chandler Carruth d9903888d9 [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Mehdi Amini 648eff1695 DAG Combiner: Fold SelectCC When Cond is UNDEF
In case folding a node end up with a NaN as operand for the select, 
the folding of the condition of the selectcc node returns "UNDEF".

Differential Revision: http://reviews.llvm.org/D6889

llvm-svn: 225952
2015-01-14 05:45:24 +00:00
Matthias Braun f50ab43214 DAGCombiner: simplify by using condition variables; NFC
llvm-svn: 225836
2015-01-13 22:17:46 +00:00
Matt Arsenault bf0db918b2 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Olivier Sallenave 325096980b Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Matt Arsenault a982e4f82b Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Hal Finkel 0ce7f372e5 [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.

llvm-svn: 225492
2015-01-09 01:29:29 +00:00
Hal Finkel 33ead6f901 Partial fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.

Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.

llvm-svn: 225485
2015-01-09 00:45:54 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Olivier Sallenave 0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Olivier Sallenave e64ad7cedd Test commit
llvm-svn: 225368
2015-01-07 19:45:17 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Alexey Samsonov 553185ee4b Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

llvm-svn: 225031
2014-12-31 00:40:28 +00:00
Mehdi Amini d38920891e Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Michael Kuperstein f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Sanjay Patel 0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Michael Kuperstein 047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Matt Arsenault 810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Owen Anderson 558012a3fc Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky f1de34b84d Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Andrea Di Biagio 0225b5bf6f [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio 26e8f4d166 [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu 44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Oliver Stannard d29db9b949 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Andrea Di Biagio e13a0b81f4 [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Oliver Stannard 8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Andrea Di Biagio ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
Louis Gerbarg e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
NAKAMURA Takumi f51a34ec1f Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Benjamin Kramer 7ad22403fb Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Sanjay Patel 3d497cd778 Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Eric Christopher 40cba91ad1 Remove unnecessary include.
llvm-svn: 219368
2014-10-08 23:38:40 +00:00
Eric Christopher f55d4714d2 Use both the cached TLI and the subtarget off of the DAG in
the DAG combiner.

llvm-svn: 219367
2014-10-08 23:38:39 +00:00
Hal Finkel 9808595319 [DAGCombine] Remove SIGN_EXTEND-related inf-loop
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:

  Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
  then the high bits must conform to getBooleanContents.

So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.

Patch by deadal nix!

llvm-svn: 219141
2014-10-06 20:19:47 +00:00
Sanjay Patel 7bc9185ab5 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
Chandler Carruth daa1ff985c [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle
that are unused.

This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.

Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.

llvm-svn: 219086
2014-10-05 19:14:34 +00:00
Sanjay Patel ab7f460bca Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel 8fde95cb2b Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Andrea Di Biagio c7c524129b [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
James Molloy 463db9a77c [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Sanjay Patel bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Xu 418da223dd Revert patch ofr218493
llvm-svn: 218494
2014-09-26 02:28:03 +00:00
David Xu 64f661ee0b Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Sanjay Patel 6a42292795 Use SDValue bool operator to reduce code. No functional change.
llvm-svn: 218314
2014-09-23 16:24:20 +00:00
Sanjay Patel b67bd262ea Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00
Hal Finkel 62ac736faa Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Sanjay Patel bb29221129 Replace dead links to "Hacker's Delight" with general references. NFC.
llvm-svn: 217814
2014-09-15 19:47:44 +00:00
Matt Arsenault 8239eaab99 Add DAG combine for shl + add of constants.
Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)

This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.

This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.

llvm-svn: 217610
2014-09-11 17:34:19 +00:00
Sanjay Patel 7bd228a82e Combine fmul vector FP constants when unsafe math is allowed.
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.

This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.

Differential Revision: http://reviews.llvm.org/D5254

llvm-svn: 217599
2014-09-11 15:45:27 +00:00
David Xu f7aff68fe3 Build correct vector filled with undef nodes
llvm-svn: 217570
2014-09-11 05:10:28 +00:00
Sanjay Patel 394c333e3e Group unsafe fmul math folds together for easier reading. No functional change.
llvm-svn: 217399
2014-09-08 20:16:42 +00:00
Sanjay Patel f4b7a6b030 Fix the FIXME that was just added in r217390 - remove a bunch of redundant fold permutations.
The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll.

llvm-svn: 217393
2014-09-08 18:22:51 +00:00
Sanjay Patel 8170dea280 group unsafe math folds together for easier reading
Also added a FIXME regarding redundant folds for non-canonicalized constants.

llvm-svn: 217390
2014-09-08 17:32:19 +00:00
Sanjay Patel 75cc90eddc Allow vector fsub ops with constants to get the same optimizations as scalars.
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.

llvm-svn: 217286
2014-09-05 22:26:22 +00:00
Sanjay Patel f4b8debc1c clean up; NFC
llvm-svn: 217278
2014-09-05 20:55:46 +00:00
Matt Arsenault c1a71217b3 Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.

llvm-svn: 216932
2014-09-02 19:02:53 +00:00
Matt Arsenault 965de3050f Fix comment and unnecessary check for FP build_vectors.
This was copy-paste from the integer version, but
FP build_vectors don't truncate.

llvm-svn: 216928
2014-09-02 18:33:51 +00:00
Hal Finkel e19006ea22 Enable splitting indexing from loads with TargetConstants
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).

llvm-svn: 216908
2014-09-02 16:05:23 +00:00
Hal Finkel 51e6fa2201 Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.

This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.

Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)

Original commit message (by Adam Nemet):

Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

llvm-svn: 216898
2014-09-02 06:24:04 +00:00
Sanjay Patel ccd267683b Move FNEG next to FABS and make them more similar, so it's easier that they can be refactored. NFC.
llvm-svn: 216688
2014-08-28 21:51:37 +00:00
Owen Anderson 3eb910b404 Do not introduce new shuffle patterns after operation legalization if SHUFFLE_VECTOR
was marked custom.  The target independent DAG combine has no way to know if
the shuffles it is introducing are ones that the target could support or not.

llvm-svn: 216678
2014-08-28 17:49:58 +00:00
Sanjay Patel 50cbfc5138 Janitorial services: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 216674
2014-08-28 16:29:51 +00:00
Sanjay Patel e28d57d9d8 Remove local TLI vars that are just duplicates of the class var. No functional change.
llvm-svn: 216673
2014-08-28 16:01:50 +00:00
Sanjay Patel 78614bf0e8 Use local vars to improve readability. No functional change.
Completes what was started in r216611 and r216623. 
Used const refs instead of pointers; not sure if one is preferable to the other.

llvm-svn: 216672
2014-08-28 15:53:16 +00:00
Sanjay Patel 159f127f63 Use local variable in visitFADD. No functional change.
llvm-svn: 216623
2014-08-27 21:42:42 +00:00
Sanjay Patel ae402a35a0 Group unsafe-math optimizations for fsub into one block. No functional change.
llvm-svn: 216616
2014-08-27 20:57:52 +00:00
Sanjay Patel a828f2ba46 Use local variable to improve readability.
No functional change intended.

llvm-svn: 216611
2014-08-27 20:40:31 +00:00
Craig Topper e1d1294853 Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Craig Topper 4627679cec Use range based for loops to avoid needing to re-mention SmallPtrSet size.
llvm-svn: 216351
2014-08-24 23:23:06 +00:00
Sanjay Patel 2cdea4c41e name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap

That name doesn't specify signed or unsigned.

Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:

   srl/add/sra

is not the general sequence for signed integer division by power-of-2. We need one more 'sra':

   sra/srl/add/sra

That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.

This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D5010

llvm-svn: 216237
2014-08-21 22:31:48 +00:00
Benjamin Kramer ff8b883772 DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

llvm-svn: 216175
2014-08-21 13:28:02 +00:00
Matt Arsenault 6cc00429ff Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

llvm-svn: 215820
2014-08-16 10:14:19 +00:00
Andrea Di Biagio b23bad11e7 [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.

llvm-svn: 215797
2014-08-16 00:29:44 +00:00
Sanjay Patel 35d3133650 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Chandler Carruth 7cd15be784 [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

llvm-svn: 215622
2014-08-14 08:18:34 +00:00
Andrea Di Biagio ace8e1e3d4 [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.

llvm-svn: 215555
2014-08-13 16:09:40 +00:00
Michael J. Spencer 6b2f5b47d2 [x86] Fold extract_vector_elt of a load into the Load's address computation.
llvm-svn: 215409
2014-08-11 23:49:33 +00:00
Sanjay Patel 8e5beb6edb Optimize vector fabs of bitcasted constant integer values.
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.

So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64

Instead of generating something like this:

movabsq (constant pool loadi of mask for sign bits)
vmovq   (move from integer register to vector/fp register)
vandps  (mask off sign bits)
vmovq   (move vector/fp register back to integer return register)

We should generate:

mov     (put constant value in return register)

I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()

is the same thing as:
IntVT.isInteger()

Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.

For more background, please see:
http://reviews.llvm.org/D4770

And:
http://llvm.org/bugs/show_bug.cgi?id=20354

Differential Revision: http://reviews.llvm.org/D4785

llvm-svn: 214892
2014-08-05 17:35:22 +00:00
Chandler Carruth 40dbd382ad [SDAG] Fix a really, really terrible bug in the DAG combiner.
This code is completely wrong. It is also dead, as if it were to *ever*
run, it would crash. Fortunately, after my work to the combiner, it is
at least *possible* to reach the code, and llvm-stress has found a test
case. Thanks to Patrick for reporting.

It would be really good if anyone who remembers how this code works and
what it was intended to do could add some more obvious test coverage
instead of my completely contrived and reduced test case. My test case
was so brittle I left a bread crumb comment in it to help the next
person to stumble on it and not know what it was actually testing for.

llvm-svn: 214785
2014-08-04 21:29:59 +00:00
Eric Christopher d913448b38 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Chandler Carruth cde4eb56fe [x86] Don't add nodes to the combined set (and prune subsequent
combines) until they are legal.

Doing it the old way could, when the stars align *just* right, cause
a node to get into the combine set prior to being legalized. Then, when
the same node showed up as an operand to another node later on (but not
so much later on that it had been deleted as dead) we would fail to add
it back to the worklist thinking it had already been combined. This
would in turn cause it to not be legalized. Fortunately, we can also
walk the operands looking for uncombined (and thus potentially
un-legalized) nodes late. It will still ensure that we walk all operands
of all nodes and send all of them through both the legalizer without
changes and the combiner at least once. (Which was the original goal of
this).

I have a test case for this bug, but it is terribly brittle. For
example, it will stop finding the bug the moment I enable the new
shuffle lowering. I don't yet have any test case that reliably exercises
this bug, and it isn't clear that it will be possible to craft one. It
is entirely possible that with the new shuffle lowering the two forms of
doing this are precisely equivalent. That doesn't mean we shouldn't take
the more conservative approach of insisting on things in the combined
set having survived the legalizer.

llvm-svn: 214673
2014-08-03 23:10:59 +00:00
Sanjay Patel 2ef67440fc fix for PR20354 - Miscompile of fabs due to vectorization
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.

This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.

There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.

llvm-svn: 214670
2014-08-03 22:48:23 +00:00
James Molloy ce45be0465 [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!

llvm-svn: 214634
2014-08-02 14:51:24 +00:00
Chandler Carruth 18066974d4 [SDAG] Refactor the code which deletes nodes in the DAG combiner to do
so using a single helper which adds operands back onto the worklist.

Several places didn't rigorously do this but a couple already did.
Factoring them together and doing it rigorously is important to delete
things recursively early on in the combiner and get a chance to see
accurate hasOneUse values. While no existing test cases change, an
upcoming patch to add DAG combining logic for PSHUFB requires this to
work correctly.

llvm-svn: 214623
2014-08-02 10:02:07 +00:00
Owen Anderson 9d5a8c2813 Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be constant-folded
during DAGCombine in certain circumstances.  Unfortunately, the circumstances required
to trigger the issue seem to require a pretty specific interaction of DAGCombines,
and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64.
The functionality added here is replicated in essentially every other DAG combine,
so it seems pretty obviously correct.

llvm-svn: 214622
2014-08-02 08:45:33 +00:00
Louis Gerbarg 09b8cdee12 White space fix.
llvm-svn: 214455
2014-07-31 22:57:46 +00:00
Louis Gerbarg 67474e3755 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Louis Gerbarg 4fc09b36de Retain alignment requirements for load->selects modified by DAGCombine
DAGCombine may choose to rewrite graphs where two loads feed a select into
graphs where a select of two addresses feed a load. While it sanity checks the
loads to make sure they are broadly equivalent it currently just uses the
alignment restriction of the left node. In cases where the right node has
stronger alignment requiresment this may lead to bad codegen, such as generating
an aligned load where an unaligned load is required. This patch makes the
combine generate a load with an alignment that is the same as whichever is more
restrictive of the two alignments.

Tests included.

rdar://17762530

llvm-svn: 214322
2014-07-30 18:24:41 +00:00
Chandler Carruth b143274ad0 [SDAG] Add DEBUG logging to the legalizer, fixing a "bug" found by
inspection in the proccess, and shuffle the logging in the DAG combiner
around a bit.

With this it is much easier to follow what the legalizer is doing. It
should even accurately present most of the strange legalization
operations where a single node is replaced by multiple nodes, etc. There
is still some information lost (we log SDNodes not SDValues so we don't
log which result is used for which thing), but I think this is much
closer to a usable system. Notably, this will make it *much* more
apparant when legalization is actually happening inside the combiner, or
when there is a cycle caused by interactions of the legalizer and the
combiner.

The "bug" I fixed here I'm not sure is remotely possible to trigger. We
were only adding one of the nodes in a replacement to the updated set
rather than all of the nodes in the replacement. Realistically, the
worst result of this are nodes not getting back onto the worklist in the
DAG combiner. I doubt it is possible to trigger this today, and
I certainly don't have any ideas about how, but this at least brings the
code into alignment with the principled operation of the routine.

llvm-svn: 214105
2014-07-28 17:55:07 +00:00
Chandler Carruth 411fb407f8 [SDAG] When performing post-legalize DAG combining, run the legalizer
over each node in the worklist prior to combining.

This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.

This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.

Differential Revision: http://reviews.llvm.org/D4564

llvm-svn: 214020
2014-07-26 05:49:40 +00:00
Matt Arsenault 197a1e26e3 Store nodes only have 1 result.
llvm-svn: 213928
2014-07-25 07:56:42 +00:00
Chandler Carruth 94bd553eb8 [SDAG] Start plumbing an assert into SDValues that we don't form one
with a result number outside the range of results for the node.

I don't know how we managed to not really check this very basic
invariant for so long, but the code is *very* broken at this point.
I have over 270 test failures with the assert enabled. I'm committing it
disabled so that others can join in the cleanup effort and reproduce the
issues. I've also included one of the obvious fixes that I already
found. More fixes to come.

llvm-svn: 213926
2014-07-25 07:23:23 +00:00
Chandler Carruth 9f4530b95d [SDAG] Introduce a combined set to the DAG combiner which tracks nodes
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898
2014-07-24 22:15:28 +00:00
Hal Finkel cc39b67530 AA metadata refactoring (introduce AAMDNodes)
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).

This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.

No functionality change intended.

llvm-svn: 213859
2014-07-24 12:16:19 +00:00
Chad Rosier 17020f96c7 [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Chandler Carruth 9a0051cd59 [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate
insertions.

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).

This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.

Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.

Differential Revision: http://reviews.llvm.org/D4616

llvm-svn: 213727
2014-07-23 07:08:53 +00:00
Chandler Carruth 3c0012beb6 [SDAG,cleanup] Switch the DAG combiner over to use the spelling
'Worklist' consistently rather than a deeply confusing mixture of
'WorkList' and 'Worklist'.

Notably, the very 'WorkList' of the DAG combiner was exposed to target
specific DAG combines under an interface 'AddToWorklist' which was
implemented by in turn calling 'AddToWorkList' in the combiner. This has
sent me circling with the wrong case in grep one too many times.

I chose to normalize on 'Worklist' because that one won the grep-vote
for llvm/lib/... by a hundered hits or so, and it is used in places
relatively "canonical" such as InstCombine's Worklist. Let's all jsut
pick this casing, whether "correct", "good", or "bad" and be
consistent...

llvm-svn: 213506
2014-07-21 08:56:44 +00:00
Chandler Carruth 24ceb0ce66 [SDAG] Rather than using a narrow test against the one dummy node on the
stack, filter all handle nodes from the DAG combiner worklist.

This will also handle cases where other handle nodes might be
(erroneously) added to the worklist and then cause bugs and explosions
when deleted. For example, when running the legalizer within the DAG
combiner, there are times when other handle nodes are used and can end
up here.

llvm-svn: 213505
2014-07-21 08:32:31 +00:00
Andrea Di Biagio 0fb2013192 [DAGCombiner] Improve the shuffle-vector folding logic.
Canonicalize shuffles according to rules:
 *  shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
 *  shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
 *  shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)

This patch helps identifying more shuffle pairs that could be combined reusing
the already existing rules in the DAGCombiner.

Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
shuffles are now folded into a single shuffle node by the DAGCombiner.
Added more test cases to 'combine-vec-shuffle-4.ll'.

llvm-svn: 213504
2014-07-21 07:30:54 +00:00
Michael J. Spencer 1eb023013e Revert "[x86] Fold extract_vector_elt of a load into the Load's address computation."
There's a bug where this can create cycles in the DAG. It will take a bit
to fix, so I'm backing it out for now.

llvm-svn: 213339
2014-07-18 00:15:50 +00:00
Tim Northover 7f3e11e7c0 CodeGen: don't form illegail EXTLOAD operations.
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.

This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).

Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.

llvm-svn: 213162
2014-07-16 15:37:24 +00:00
Andrea Di Biagio bd5555cc3f [DAGCombiner] Add more rules to fold shuffles.
This patch adds two new rules to the DAGCombiner:
 1.  shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
 2.  shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2

We only do this if the combined shuffle is legal for the target.

Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
  %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
  %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
  ret <4 x i32> %2
}
;;

(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
  pshufd $120, %xmm0, %xmm0
  shufps $-108, %xmm0, %xmm1
  movaps %xmm1, %xmm0

Now the x86 backend generates:
  movsd %xmm1, %xmm0

llvm-svn: 213069
2014-07-15 13:26:28 +00:00
Andrea Di Biagio 2152a6c78b [DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.
This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid
call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal'
always expects a legal vector value type in input.

With this patch, we immediately check if the input OR dag node has a legal
vector type; we only try to fold a OR dag node into a single shufflevector
if we know that the resulting shuffle will have a legal type.
This is to avoid calling method 'isShuffleMaskLegal' on a potentially
illegal vector value type.

Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that
DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles
with illegal types.

llvm-svn: 213020
2014-07-15 00:02:32 +00:00
Andrea Di Biagio 3960a9571f [DAGCombiner] Add more rules to combine shuffle vector dag nodes.
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
  1.  shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
  2.  shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)

The new rules would only trigger if the resulting shuffle has legal type and
legal mask.

Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.

llvm-svn: 213001
2014-07-14 22:46:26 +00:00
Andrea Di Biagio 67d8b2e2b0 [DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles.
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
  (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)

The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.

Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.

llvm-svn: 212915
2014-07-13 21:02:14 +00:00
Matt Arsenault 3332b70627 Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.""
Don't try to convert the select condition type.

llvm-svn: 212750
2014-07-10 18:21:04 +00:00
Andrea Di Biagio b2921c7ca0 [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches the DAGCombiner how to fold shuffles according to the
following new rules:
  1. shuffle(shuffle(x, y), undef) -> x
  2. shuffle(shuffle(x, y), undef) -> y
  3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef)
  4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef)

The backend avoids to combine shuffles according to rules 3. and 4. if
the resulting shuffle does not have a legal mask. This is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence of
target specific dag nodes during vector legalization.

Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers
the new rules when combining shuffles.

llvm-svn: 212748
2014-07-10 18:04:55 +00:00
NAKAMURA Takumi f862ce8908 Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."
This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations.

llvm-svn: 212708
2014-07-10 11:37:28 +00:00
Daniel Sanders cbd44c591d Make it possible for ints/floats to return different values from getBooleanContents()
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.

Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
  discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
  Come to think of it, this one could test for the common case of 'C'
  being a SETCC too.

Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4389

llvm-svn: 212697
2014-07-10 10:18:12 +00:00
Hao Liu 71224b02fb [AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector.
llvm-svn: 212677
2014-07-10 03:41:50 +00:00
Matt Arsenault 658c5576d1 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

llvm-svn: 212640
2014-07-09 19:12:07 +00:00
Chandler Carruth f0a33b71e9 [SDAG] At the suggestion of Hal, switch to an output parameter that
tracks which elements of the build vector are in fact undef.

This should make actually inpsecting them (likely in my next patch)
reasonably pretty. Also makes the output parameter optional as it is
clear now that *most* users are happy with undefs in their splats.

llvm-svn: 212581
2014-07-09 00:41:34 +00:00
Andrea Di Biagio d261e98f3d [DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches how to fold a shuffle according to rule:
  shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)

We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.

This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.

That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.

llvm-svn: 212539
2014-07-08 15:22:29 +00:00
Chandler Carruth b844e72e85 [SDAG] Build up a more rich set of APIs for querying build-vector SDAG
nodes about whether they are splats. This is factored out and improved
from r212324 which got reverted as it was far too aggressive. The new
API should help more conservatively handle buildvectors that are
a mixture of splatted and undef values.

No functionality change at this point. The hope is to slowly
re-introduce the undef-tolerant optimization of splats, but each time
being forced to make a concious decision about how to handle the undefs
in a way that doesn't lead to contradicting assumptions about the
collapsed value.

Hal has pointed out in discussions that this may not end up being the
desired API and instead it may be more convenient to get a mask of the
undef elements or something similar. I'm starting simple and will expand
the API as I adapt actual callers and see exactly what they need.

llvm-svn: 212514
2014-07-08 07:19:55 +00:00
Chandler Carruth beeacac0b3 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

llvm-svn: 212475
2014-07-07 19:03:32 +00:00
Chandler Carruth 5d79bb5d32 [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.

This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.

With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.

We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.

llvm-svn: 212324
2014-07-04 08:11:49 +00:00
Ulrich Weigand f236bb1b5b Fix ppcf128 component access on little-endian systems
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part.  When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second.  This is true both on big-endian and little-endian
PowerPC systems.  (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)

This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:

- When printing an immediate ppcf128 constant to assembler output
  in emitGlobalConstantFP, emit the high part first on both big-
  and little-endian systems.

- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
  (which is used e.g. when generating code to load an argument into a
  register pair), use correct low/high part ordering on little-endian
  systems.

- In a related issue, because lowering ppcf128 into a pair of f64 must
  operate differently from lowering an int128 into a pair of i64,
  bitcasts between ppcf128 and int128 must not be optimized away by the
  DAG combiner on little-endian systems, but must effect a word-swap.

Reviewed by Hal Finkel.

llvm-svn: 212274
2014-07-03 15:06:47 +00:00
Tom Stellard 7783b0adf4 Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.

llvm-svn: 210792
2014-06-12 16:04:47 +00:00
Tom Stellard 3787b12255 SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC
The SelectionDAG bad a special case for ISD::SELECT_CC, where it would
allow targets to specify:

setOperationAction(ISD::SELECT_CC, MVT::Other, Expand);

to indicate that they wanted to expand ISD::SELECT_CC for all types.
This wasn't applied correctly everywhere, and it makes writing new
DAG patterns with ISD::SELECT_CC difficult.

llvm-svn: 210541
2014-06-10 16:01:29 +00:00
Tom Stellard b9a023383e SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors
This prevents a future commit from regressing:

test/CodeGen/R600/setcc-equivalent.ll

llvm-svn: 210540
2014-06-10 16:01:25 +00:00
Andrea Di Biagio f99dd64f0a [X86] Add target combine rules for horizontal add/sub.
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.

This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:

  (insert_vector_elt (insert_vector_elt A, I0), I1) ->
    (insert_vecto_elt (insert_vector_elt A, I1), I0)

This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.

llvm-svn: 210477
2014-06-09 16:54:41 +00:00
Andrea Di Biagio 4db1abea15 [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.
This patch modifies SelectionDAGBuilder to construct SDNodes with associated
NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator
instructions.

Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing
nsw/nuw/exact flags during codegen.

Patch by Marcello Maggioni.

llvm-svn: 210467
2014-06-09 12:32:53 +00:00
Andrea Di Biagio 446a527905 [X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.

This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
        (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
        (shuffle (BINOP A, B), Undef, <Mask>).

Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.

llvm-svn: 209930
2014-05-30 23:17:53 +00:00
Filipe Cabecinhas 82111f12fb Convert a vselect into a concat_vector if possible
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.

Added a test.

The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.

Reviewers: nadav, delena, grosbach, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3916

llvm-svn: 209929
2014-05-30 23:03:11 +00:00
Michael J. Spencer f375d80635 [x86] Fold extract_vector_elt of a load into the Load's address computation.
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.

llvm-svn: 209788
2014-05-29 01:42:45 +00:00
Hal Finkel 2c77fe59d9 Revert "[DAGCombiner] Split up an indexed load if only the base pointer value is live"
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.

llvm-svn: 209747
2014-05-28 15:33:19 +00:00
Jay Foad a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Adam Nemet 5d78558c2b [DAGCombiner] Split up an indexed load if only the base pointer value is live
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

llvm-svn: 208640
2014-05-12 23:00:03 +00:00
Tim Northover 820e041a3c DAGCombine: prevent formation of illegal ConstantFP nodes.
llvm-svn: 207850
2014-05-02 17:25:02 +00:00
Weiming Zhao 7f6daf1799 [ARM64] Prevent bit extraction to be adjusted by following shift
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
  %shr = lshr i64 %x, 4
  %and = and i64 %shr, 15
  %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
  %0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
  lsr x8, x0, #1
  and x8, x8, #0x78
  add x8, x9, x8

If using ubfx, it only needs 2 instrs:
  ubfx  x8, x0, #4, #4
  add x8, x9, x8, lsl #3

This fixes bug 19589

llvm-svn: 207702
2014-04-30 21:07:24 +00:00
Jim Grosbach 2eb60fdc85 Tidy up whitespace.
llvm-svn: 207583
2014-04-29 22:41:50 +00:00
Craig Topper 633d99b62d Convert AddNodeIDNode and SelectionDAG::getNodeIfExiists to use ArrayRef<SDValue>
llvm-svn: 207383
2014-04-27 23:22:43 +00:00
Craig Topper dd5e16dd34 Convert one last signature of getNode to take an ArrayRef of SDUse.
llvm-svn: 207376
2014-04-27 19:21:06 +00:00
Benjamin Kramer da4841b3a9 DAGCombiner: Simplify code a bit, make more transforms work with vectors.
llvm-svn: 207338
2014-04-26 23:09:49 +00:00
Craig Topper 48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Benjamin Kramer ad0168702a Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors.
llvm-svn: 207316
2014-04-26 13:00:53 +00:00
Benjamin Kramer 4dae598bc8 DAGCombiner: Turn divs of vector splats into vectorized multiplications.
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315
2014-04-26 12:06:28 +00:00
Hao Liu c636d15284 Fix an infinite loop bug in DAG Combine about keeping transfering between ANY_EXTEND and SIGN_EXTEND.
llvm-svn: 206873
2014-04-22 09:57:06 +00:00
Chandler Carruth 1b9dde087e [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.

Other sub-trees will follow.

llvm-svn: 206837
2014-04-22 02:02:50 +00:00
Tim Northover 863a789a99 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

llvm-svn: 206365
2014-04-16 09:03:09 +00:00
Robert Lougher a9bf2463b9 Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Craig Topper c0196b1b40 [C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr.
llvm-svn: 206142
2014-04-14 00:51:57 +00:00
Hal Finkel 3b48d08f54 Reenable use of TBAA during CodeGen
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).

However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.

Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.

llvm-svn: 206093
2014-04-12 01:26:00 +00:00
Jim Grosbach e816003d3f [c++11] Range'ify use list loops in DAGCombiner.
llvm-svn: 206014
2014-04-11 01:13:13 +00:00
Quentin Colombet 0b1a5584d6 [DAGCombiner] DAG combine does not know how to combine indexed loads with
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.

<rdar://problem/16389332>

llvm-svn: 205923
2014-04-09 20:03:05 +00:00
Matt Arsenault aaf9623d55 Bug 19348: Check for legal ExtLoad operation before folding
(aext (zextload x)) -> (aext (truncate (*extload x)))

Patch by Stanislav Mekhanoshin!

llvm-svn: 205805
2014-04-08 21:40:37 +00:00
Matt Arsenault e407ae9846 Make isSetCCEquivalent respect the TargetBooleanContents
llvm-svn: 205336
2014-04-01 18:13:26 +00:00
Hal Finkel 02807595fb Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT
When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:

  %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
  %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
  %i = add <2 x i32> %b2, <i32 2, i32 3>

If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.

By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.

llvm-svn: 205179
2014-03-31 11:43:19 +00:00
Andrea Di Biagio 5b0aacf1c7 [DAG] Fix an assertion failure caused by an invalid cast in method 'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.

Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.

llvm-svn: 204536
2014-03-22 01:47:22 +00:00
Andrea Di Biagio 28f46d9f39 [DAGCombiner] teach how to simplify xor/and/or nodes according to the following rules:
1)  (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
 2)  (OR  (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR  (A, B), C, Mask)
 3)  (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)

 4)  (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
 5)  (OR  (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR  (A, B), Mask)
 6)  (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)

llvm-svn: 204160
2014-03-18 17:12:59 +00:00
Matt Arsenault 985b9de485 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Craig Topper 7b883b314c [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203339
2014-03-08 06:31:39 +00:00
Adam Nemet 7f928f14f4 [DAGCombiner] Distribute TRUNC through AND in rotation amount
This is already done for shifts.  Allow it for rotations as well. E.g.:

   (rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))

Use the newly factored-out distributeTruncateThroughAnd.

With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above.  HW implicitly
performs this masking.

The testcase will be added as part of the X86 patch.

llvm-svn: 203316
2014-03-07 23:56:30 +00:00
Adam Nemet 5117f5dffc [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Adam Nemet c6553a8354 [DAGCombiner] Slightly improve readability of matchRotateSub
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.

Better connect the comment which talks about Mask and the code which used
LoBits.  Renamed variable to MaskLoBits.

llvm-svn: 203314
2014-03-07 23:56:24 +00:00
Andrea Di Biagio 6292a140ee [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Adam Nemet 67483897a5 [DAGCombiner] Factor out distributeTruncateThroughAnd
Currently this code is duplicated across visitSHL, visitSRA and visitSRL.  The
plan is to add rotates as clients to this new function.

There is no functional change intended here.

llvm-svn: 202908
2014-03-04 23:28:31 +00:00
Benjamin Kramer d6f1f84f51 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer 3a377bce4e Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Hal Finkel ab51ecd4fc Fix visitTRUNCATE for legal i1 values
This extract-and-trunc vector optimization cannot work for i1 values as
currently implemented, and so I'm disabling this for now for i1 values. In the
future, this can be fixed properly.

Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and
this will be covered by one of the existing regression tests.

llvm-svn: 202449
2014-02-28 00:26:45 +00:00
Matt Arsenault 58a7639698 Trivial code simplification
llvm-svn: 202073
2014-02-24 21:01:15 +00:00
Quentin Colombet 4db08df18e [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with the
shifted mask rather than masking and shifting separately.

The patch adds this transformation to the DAGCombiner:

  (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)

<rdar://problem/16054492>

Patch by Adam Nemet <anemet@apple.com>

llvm-svn: 201906
2014-02-21 23:42:41 +00:00
Robert Lougher 7d9084ffa1 Teach the DAGCombiner how to fold concat_vector nodes when the input is two
BUILD_VECTOR nodes, e.g.:

(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)

This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.

llvm-svn: 201158
2014-02-11 15:42:46 +00:00
Juergen Ributzka fa0eba6c8b [DAG] Don't pull the binary operation though the shift if the operands have opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900
2014-02-06 04:09:06 +00:00
Manman Ren 413a6cb42b This patch teaches the DAGCombiner how to fold insert_subvector nodes
when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506
2014-01-31 01:10:35 +00:00
Owen Anderson 60a4678c42 DAGCombine should not produce ISD::OR nodes after operation legalization if they're not legal.
llvm-svn: 200503
2014-01-31 00:51:43 +00:00
Andrea Di Biagio b6d39afbda [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend.
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.

This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.

Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.

llvm-svn: 200313
2014-01-28 12:53:56 +00:00
Matt Arsenault 5f2a92a26c Fix sext(setcc) -> select_cc using wrong type for setcc.
Also update the comment, since it actually produces a
select (setcc) instead of select_cc.

It was checking and using the setcc result type for the
type of the sext, instead of the type of the compared items.

In my problem case, the sext was to i32 and was used as the setcc type,
but the expected type was i64.

No test since I haven't been able to hit the problem with
this on any in-tree targets.

llvm-svn: 200249
2014-01-27 21:41:54 +00:00
Andrea Di Biagio f09a357765 [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).

The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.

Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.

llvm-svn: 200234
2014-01-27 18:45:30 +00:00
Stepan Dyatkovskiy 157bb42e27 Fix for PR18102.
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.

Consider, how MergeConsequtiveStores works for next example:

store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1]   ; a[1] again.
return   ; DAG starts here

1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.

The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.

So after merging we have the next result:

store i16 (1 | 3 << 8), base   ; is a[0] but bit-casted to i16
store i8 2, a[1]

So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].

Fix: In sort routine just also take into account mem-op sequence number. 
llvm-svn: 200201
2014-01-27 09:18:31 +00:00
Hal Finkel dbebb52a2f Disable the use of TBAA when using AA in CodeGen
There are currently two issues, of which I currently know, that prevent TBAA
from being correctly usable in CodeGen:

  1. Stack coloring does not update TBAA when merging allocas. This is easy
     enough to fix, but is not the largest problem.

  2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations.
     Because BasicAA does not handle inttoptr, we'll often miss basic type punning
     idioms that we need to catch so we don't miscompile real-world code (like LLVM).

I don't yet have a small test case for this, but this fixes self hosting a
non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle.

llvm-svn: 200093
2014-01-25 19:24:54 +00:00
Hal Finkel 9b2617a5a8 Add combiner-aa-only-func (debug only)
This option (which is !NDEBUG only) allows restricting the use of alias
analysis in DAGCombiner to a specific function. This has proved extremely
valuable to isolating bugs related to this feature, and mirrors the
misched-only-func option provided by the new instruction scheduler.

llvm-svn: 200088
2014-01-25 17:32:39 +00:00
Hal Finkel 5fb07341f1 Improve descriptions of combiner-alias-analysis and combiner-global-alias-analysis
llvm-svn: 200087
2014-01-25 17:32:37 +00:00
Juergen Ributzka f26beda7c7 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062
2014-01-25 02:02:55 +00:00
Hans Wennborg 4d67a2e85a Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

llvm-svn: 200058
2014-01-25 01:18:18 +00:00
Juergen Ributzka 4f3df4ad64 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

llvm-svn: 200034
2014-01-24 20:18:00 +00:00
Hal Finkel 51a9838049 Fix DAGCombiner::GatherAllAliases to account for non-chain dependencies
DAGCombiner::GatherAllAliases, which is only used when AA used is enabled
during DAGCombine, had a fundamentally incorrect assumption for which this
change compensates. GatherAllAliases, which is used to find aliasing
predecessor chain nodes (so that a better chain can be selected for a load or
store to enable subsequent optimizations) assumed that walking up the chain
would always catch all possibly-aliasing loads and stores. This is not true: To
really find all aliases, we also need to search for aliases through the value
operand of a store, etc.  Consider the following situation:

  Token1 = ...
  L1 = load Token1, %52
  S1 = store Token1, L1, %51
  L2 = load Token1, %52+8
  S2 = store Token1, L2, %51+8
  Token2 = Token(S1, S2)
  L3 = load Token2, %53
  S3 = store Token2, L3, %52
  L4 = load Token2, %53+8
  S4 = store Token2, L4, %52+8

If we search for aliases of S3 (which loads address %52), and we look only
through the chain, then we'll miss the trivial dependence on L1 (which loads
from %52). We then might change all loads and stores to use Token1 as their
chain operand, which could result in copying %53 into %52 before copying
%52 into %51 (which should happen first).

The problem is, however, that searching for such data dependencies can become
expensive, and the cost is not directly related to the chain depth. Instead,
we'll rule out such configurations by insisting that we've visited all chain
users (except for users of the original chain, which is not necessary).  When
doing this, we need to look through nodes we don't care about (otherwise,
things like register copies will interfere with trivial use cases).

Unfortunately, I don't have a small test case for this problem. Creating the
underlying situation is not hard (a pair of memcpys will do it), but arranging
for the default instruction schedule to be incorrect is very fragile.

This unbreaks self hosting on PPC64 when using
-mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis.

llvm-svn: 200033
2014-01-24 20:12:02 +00:00
Juergen Ributzka 50e7e80d00 Revert "Add Constant Hoisting Pass"
This reverts commit r200022 to unbreak the build bots.

llvm-svn: 200024
2014-01-24 18:40:30 +00:00
Hal Finkel ccc18e1330 Restrict FindBetterChain DAG combines to unindexed nodes
These transformations obviously won't work for indexed (pre/post-inc) loads and
stores. In practice, I'm not sure there is any benefit to enabling them for
indexed nodes because other transformations that these might enable likely also
won't handle indexed nodes.

I don't have an in-tree test case that hits this problem, but an upcoming bug
fix will make it much more likely.

llvm-svn: 200023
2014-01-24 18:25:26 +00:00
Juergen Ributzka 38b67d0caf Add Constant Hoisting Pass
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.

First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.

If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.

When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.

This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)

Reviewed by Eric

llvm-svn: 200022
2014-01-24 18:23:08 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Elena Demikhovsky 9d56f1e0e5 AVX512: combining setcc and zext is wrong on AVX512
because vector compare instruction puts result in mask register.

llvm-svn: 199798
2014-01-22 12:26:19 +00:00
Owen Anderson fb00d5bc7c Allow SMUL_LOHI and UMUL_LOHI to be narrow to MUL on targets where MUL is Custom rather than Legal. Even if the target is doing some kind of expansion for MUL, it's pretty much guaranteed to be more efficent than whatever it does for SMUL_LOHI or UMUL_LOHI!
llvm-svn: 199678
2014-01-20 18:41:34 +00:00
Andrea Di Biagio d7c03ec348 [DAGCombiner] Fix a wrong check in method SimplifyVBinOp.
This fixes a regression intruced by r199135.

Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().

However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.

After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.

This fixes the problem restoring the old behavior of SimplifyVBinOp.

llvm-svn: 199328
2014-01-15 19:51:32 +00:00
Juergen Ributzka 6840282c99 [DAG] Refactor ReassociateOps - no functional change intended.
llvm-svn: 199146
2014-01-13 21:49:25 +00:00
Juergen Ributzka 7384405f23 [DAG] Teach DAG to also reassociate vector operations
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.

Reviewed by Andrea Di Biagio

llvm-svn: 199135
2014-01-13 20:51:35 +00:00
Richard Sandiford 15cfc1c33c Handle masked rotate amounts
At the moment we expect rotates to have the form:

   (or (shl X, Y), (shr X, Z))

where Y == bitsize(X) - Z or Z == bitsize(X) - Y.  This form means that
the (or ...) is undefined for Y == 0 or Z == 0.  This undefinedness can
be avoided by using Y == (C * bitsize(X) - Z) & (bitsize(X) - 1) or
Z == (C * bitsize(X) - Y) & (bitsize(X) - 1) for any integer C
(including 0, the most natural choice).

llvm-svn: 198861
2014-01-09 10:56:42 +00:00
Richard Sandiford 0f264db3c6 Match the InstCombine form of rotates by X+C
InstCombine converts (sub 32, (add X, C)) into (sub 32-C, X),
so a rotate left of a 32-bit Y by X+C could appear as either:

   (or (shl Y, (add X, C)), (shr Y, (sub 32, (add X, C))))

without InstCombine or:

   (or (shl Y, (add X, C)), (shr Y, (sub 32-C, X)))

with it.

We already matched the first form.  This patch handles the second too.

llvm-svn: 198860
2014-01-09 10:49:40 +00:00
Andrea Di Biagio 23df4e4a2d Teach the DAGCombiner how to fold 'vselect' dag nodes according
to the following two rules:
  1) fold (vselect (build_vector AllOnes), A, B) -> A
  2) fold (vselect (build_vector AllZeros), A, B) -> B

llvm-svn: 198777
2014-01-08 18:33:04 +00:00
Richard Sandiford 95c864d9bd [DAGCombiner] Factor duplicated rotate code into a separate function
No functional change intended.

llvm-svn: 198768
2014-01-08 15:40:47 +00:00
Kevin Qin 5cd73c9e0a [AArch64 NEON] Fix invalid constant used in vselect condition.
There is a wrong assumption that the vector element type and the
type of each ConstantSDNode in the build_vector were the same.
However, when promoting the integer operand of a legally typed
build_vector, the operand type and the vector element type do not
need to be the same
(See method 'DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR' in
LegalizeIntegerTypes.cpp).

  in AArch64 backend, the following dag sequence:

  C0: i1 = Constant<0>
  C1: i1 = Constant<-1>
  V: v8i1 = BUILD_VECTOR C1, C1, C0, C0, C0, C0, C0, C0

  is type-legalized into:

  NewC0: i32 = Constant<0>
  NewC1: i32 = Constant<1>
  V: v8i8 = BUILD_VECTOR NewC1, NewC1, NewC0, NewC0, NewC0, NewC0, NewC0, NewC0

Forcing a getZeroExtend to VTBits to ensure that the new constant
is correctly.

llvm-svn: 198582
2014-01-06 02:26:10 +00:00
Kevin Qin ede9ce1933 Fix a bug in DAGcombiner about zero-extend after setcc.
For AArch64 backend, if DAGCombiner see "sext(setcc)", it will
combine them together to a single setcc with extended value type.
Then if it see "zext(setcc)", it assumes setcc is Vxi1, and try to
create "(and (vsetcc), (1, 1, ...)". While setcc isn't Vxi1,
DAGcombiner will create wrong node and get wrong code emitted.

llvm-svn: 198190
2013-12-30 02:05:13 +00:00
Andrea Di Biagio 46dcddb350 Teach DAGCombiner how to fold a SIGN_EXTEND_INREG of a BUILD_VECTOR of
ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.

For example, given the following sequence of dag nodes:

  i32 C = Constant<1>
  v4i32 V = BUILD_VECTOR C, C, C, C
  v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1

The SIGN_EXTEND_INREG node can be folded into a build_vector since
the vector in input is a BUILD_VECTOR of constants.

The optimized sequence is:

  i32 C = Constant<-1>
  v4i32 Result = BUILD_VECTOR C, C, C, C

llvm-svn: 198084
2013-12-27 20:20:28 +00:00
Richard Sandiford d1093636cc Extend (truncate (load)) folding
DAGCombiner could fold (truncate (load)) -> smaller load if the original
load was the width of the truncation result or wider.  This patch extends
it to handle cases where the original load was narrower (and so the
extension type stays the same).

llvm-svn: 197030
2013-12-11 11:37:27 +00:00
Nadav Rotem 6eee080450 Fix PR18162 - Incorrect assertion assumed that the SDValue resno is zero.
llvm-svn: 196858
2013-12-10 01:13:59 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Bill Wendling 9200bb08f9 Unrevert r195599 with testcase fix.
I'm not sure how it was checking for the wrong values...
PR18023.

llvm-svn: 195670
2013-11-25 18:05:22 +00:00
Amara Emerson f59125f5bb Revert r195599 as it broke the builds.
llvm-svn: 195636
2013-11-25 11:24:18 +00:00
Daniel Sanders b021c6fdbd Fixed tryFoldToZero() for vector types that need expansion.
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.

Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.

Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.

Reviewers: resistor

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2251

llvm-svn: 195635
2013-11-25 11:14:43 +00:00
Bill Wendling e3c48709ed Don't look past volatile loads.
A volatile load should block us from trying to coalesce stores.
PR18023

llvm-svn: 195599
2013-11-25 05:01:21 +00:00
Tom Stellard 9cbd2c5581 Split SETCC if VSELECT requires splitting too.
This patch is a rewrite of the original patch commited in r194542. Instead of
relying on the type legalizer to do the splitting for us, we now peform the
splitting ourselves in the DAG combiner. This is necessary for the case where
the vector mask is a legal type after promotion and still wouldn't require
splitting.

Patch by: Juergen Ributzka

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195397
2013-11-22 00:39:23 +00:00
Benjamin Kramer bb1dd73d3e DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants.
llvm-svn: 194959
2013-11-17 10:40:03 +00:00
Matt Arsenault c5559bb14b Add target hook to prevent folding some bitcasted loads.
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)

On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.

Patch by Micah Villmow.

llvm-svn: 194783
2013-11-15 04:42:23 +00:00
Juergen Ributzka 34c652d34d SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.

The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 194542
2013-11-13 01:57:54 +00:00
Daniel Sanders a1840d2f88 Vector forms of SHL, SRA, and SRL can be constant folded using SimplifyVBinOp too
Reviewers: dsanders

Reviewed By: dsanders

CC: llvm-commits, nadav

Differential Revision: http://llvm-reviews.chandlerc.com/D1958

llvm-svn: 194393
2013-11-11 17:23:41 +00:00
Juergen Ributzka 3bd686d493 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
Now Hexagon and SystemZ are not happy with it :-(

llvm-svn: 193677
2013-10-30 06:36:19 +00:00
Juergen Ributzka 6ad05d6b95 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 193676
2013-10-30 05:48:18 +00:00
Richard Sandiford 981fdeb477 [DAGCombiner] Respect volatility when checking for aliases
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses.  This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.

llvm-svn: 193518
2013-10-28 12:00:00 +00:00
Richard Sandiford 39c1ce4dc1 Keep TBAA info when rewriting SelectionDAG loads and stores
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one).  This patch tries to catch all cases where
the TBAA information can legitimately be carried over.

The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields.  (The corresponding
getTruncStore() already exists.)  The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info).  If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.

llvm-svn: 193517
2013-10-28 11:17:59 +00:00
Nadav Rotem d369d4bdf9 Optimize concat_vectors(X, undef) -> scalar_to_vector(X).
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.

llvm-svn: 193393
2013-10-25 06:41:18 +00:00
Andrea Di Biagio 561badf717 Fix edge condition in DAGCombiner to improve codegen of shift sequences.
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))

remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.

llvm-svn: 192883
2013-10-17 11:02:58 +00:00
Jack Carter d4e9615d1c [projects/test-suite] White space and long line fixes.
No functionality changes.

llvm-svn: 192863
2013-10-17 01:34:33 +00:00
Benjamin Kramer 00eb07b791 DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal constant.
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.

llvm-svn: 192795
2013-10-16 14:16:19 +00:00
Quentin Colombet de0e06234c [DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support.
This should fix the buildbots.

Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32

into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192476
2013-10-11 18:29:42 +00:00
Quentin Colombet 5aee63d9e3 [DAGCombiner] Revert load slicing (r192471), until I figure out why it fails on ubuntu.
llvm-svn: 192474
2013-10-11 18:17:17 +00:00
Quentin Colombet 41dc258f71 [DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
 a = load i64* addr
 b = trunc i64 a to i32
 c = lshr i64 a, 32
 d = trunc i64 c to i32

into:
 b = load i32* addr1
 d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192471
2013-10-11 18:01:14 +00:00
Hal Finkel dbc7a8a8a3 Fix DAGCombiner::visitFP_EXTEND to ignore indexed loads
DAGCombiner::visitFP_EXTEND will apply the following transformation:

  fold (fpext (load x)) -> (fpext (fptrunc (extload x)))

but the implementation does not handle indexed loads (pre/post inc.), but did
not specifically ignore them either (unlike for extending loads, which it
already ignored), causing an assert when the transformation was applied to an
indexed load. This is the minimal fix for correctness (causing the
transformation to be skipped for indexed loads).

Unfortunately, I don't have an in-tree test case.

llvm-svn: 191989
2013-10-04 22:18:12 +00:00
Jin-Gu Kang 0bf8241d4b Added checking code whehter target supports specific dag combining about rotate
or not. The corresponding dag patterns are as following:

"DAGCombier::MatchRotate" function in DAGCombiner.cpp
Pattern1
// fold (or (shl (*ext x), (*ext y)),
//          (srl (*ext x), (*ext (sub 32, y)))) ->
//   (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext y)),
//          (srl (*ext x), (*ext (sub 32, y)))) ->
//   (*ext (rotr x, (sub 32, y)))

pattern2
// fold (or (shl (*ext x), (*ext (sub 32, y))),
//          (srl (*ext x), (*ext y))) ->
//   (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext (sub 32, y))),
//          (srl (*ext x), (*ext y))) ->
//   (*ext (rotr x, (sub 32, y)))

llvm-svn: 191905
2013-10-03 15:58:48 +00:00
Andrea Di Biagio 56ce9c4e78 Re-apply the change from r191393 with fix for pr17380.
This change fixes the problem reported in pr17380 and re-add the dagcombine 
transformation ensuring that the value types are always legal if the 
transformation is triggered after Legalization took place.

Added the test case from pr17380.

llvm-svn: 191509
2013-09-27 11:37:05 +00:00
Andrea Di Biagio 549d6605a0 Revert r191393 since it caused pr17380.
llvm-svn: 191438
2013-09-26 16:54:01 +00:00
Andrea Di Biagio 9f3313109f Teach DAGCombiner how to canonicalize dags according to the rule
(shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)).

The rule only triggers when there are no other uses of the
zext to avoid materializing more instructions.

This helps the DAGCombiner understand that the shl/shr
sequence can then be converted into an and instruction.

llvm-svn: 191393
2013-09-25 19:01:01 +00:00
Benjamin Kramer 64bdb29a83 DAGCombiner: Unify rotate matching for extended and unextended amounts.
No functionality change, lots of indentation changes.

llvm-svn: 191303
2013-09-24 14:21:28 +00:00
Kay Tiong Khoo 9195a5b081 fix typo: than -> then
llvm-svn: 191214
2013-09-23 18:43:51 +00:00
Juergen Ributzka f043a65327 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
This reverts commit r191130.

llvm-svn: 191138
2013-09-21 15:09:46 +00:00
Juergen Ributzka e9a80fc912 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191130
2013-09-21 04:55:18 +00:00
David Blaikie 9d117ab7ef Add braces to suppress Clang's dangling-else warning.
These violations were introduced in r191049

llvm-svn: 191059
2013-09-20 00:33:11 +00:00
Kai Nacke d09bb4614b PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191049
2013-09-19 23:00:28 +00:00
Kai Nacke 2d967b2751 Revert PR16726: extend rol/ror matching
There is a buildbot failure. Need to investigate this.

llvm-svn: 191048
2013-09-19 22:53:36 +00:00
Kai Nacke 4eaf6444fa PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191045
2013-09-19 22:36:39 +00:00
Benjamin Kramer d443e4a080 DAGCombiner: Don't fold vector muls with constants that look like a splat of a power of 2 but differ in bit width.
PR17283.

llvm-svn: 191000
2013-09-19 13:28:20 +00:00
Hal Finkel 31658834e6 Prevent assert in CombinerGlobalAA with null values
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).

llvm-svn: 190763
2013-09-15 02:19:49 +00:00
Hal Finkel 5ef4dccdce Use TargetSubtargetInfo::useAA() in DAGCombine
This uses the TargetSubtargetInfo::useAA() function to control the defaults of
the -combiner-alias-analysis and -combiner-global-alias-analysis options.

llvm-svn: 189564
2013-08-29 03:29:55 +00:00
Juergen Ributzka 11c52c601a Fix a typo and coding style of a previous commit. No functional change.
llvm-svn: 189526
2013-08-28 22:33:58 +00:00
Tim Northover 819bfb5a25 DAGCombiner: make sure or/shl/srl really has zero high bits before forming bswap
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.

llvm-svn: 189348
2013-08-27 13:46:45 +00:00
Tom Stellard 838e2344ec SelectionDAG: Remove unnecessary uses of TargetLowering::getPointerTy()
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().

Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());

llvm-svn: 189227
2013-08-26 15:06:10 +00:00
Juergen Ributzka 3db39dc1ae Teach BaseIndexOffset::match to identify base pointers in loops.
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.

This commit fixes the issue by adding an additional pattern match
and also a test case.

Reviewer: Nadav
llvm-svn: 188936
2013-08-21 21:53:38 +00:00
Craig Topper d9c2783d8f Replace getValueType().getSimpleVT() with getSimpleValueType().
llvm-svn: 188442
2013-08-15 02:44:19 +00:00
Jim Grosbach 327ccc787e DAG: Combine (and (setne X, 0), (setne X, -1)) -> (setuge (add X, 1), 2)
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
  testl %edi, %edi
  setne %al
  cmpl  $-1, %edi
  setne %cl
  andb  %al, %cl

With this transform, we generate the simpler:
  incl  %edi
  cmpl  $1, %edi
  seta  %al

Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.

rdar://14689217

llvm-svn: 188315
2013-08-13 21:30:58 +00:00
Craig Topper 309dfefb6f Optimize mask generation for one of the DAG combiner shufflevector cases.
llvm-svn: 187961
2013-08-08 07:38:55 +00:00
Tom Stellard d42c594960 TargetLowering: Add getVectorIdxTy() function v2
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR.  The default implementation returns
the result from TargetLowering::getPointerTy()

The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets.  However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces.  On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).

There is no intended functionality change with this patch.

llvm-svn: 187748
2013-08-05 22:22:01 +00:00
Quentin Colombet 6bf4baa408 [DAGCombiner] insert_vector_elt: Avoid building a vector twice.
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN 

The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
  vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
  optimizations.

llvm-svn: 187396
2013-07-30 00:24:09 +00:00
Tom Stellard c54731aa9d DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)Free
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.

llvm-svn: 187007
2013-07-23 23:55:03 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Craig Topper e0b711864c Pass SmallVector by const reference instead of by value.
llvm-svn: 186243
2013-07-13 07:43:40 +00:00
Stephen Lin 10947502e5 Remove trailing whitespac
llvm-svn: 186032
2013-07-10 20:47:39 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Hal Finkel 6c29bd9088 DAGCombine tryFoldToZero cannot create illegal types after type legalization
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.

This was another bug found by llvm-stress.

llvm-svn: 185949
2013-07-09 17:02:45 +00:00
Stephen Lin 8e8424eb17 Style fixes: remove unnecessary braces for one-statement if blocks, no else after return, etc. No funcionality change.
llvm-svn: 185893
2013-07-09 00:44:49 +00:00
Stephen Lin cfe7f352c7 Remove trailing whitespace from SelectionDAG/*.cpp
llvm-svn: 185780
2013-07-08 00:37:03 +00:00
Benjamin Kramer c7332b2796 DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe.
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.

Fixes PR16551.

llvm-svn: 185755
2013-07-06 14:05:09 +00:00
Tim Northover 6823900e55 DAGCombiner: fix use-counting issue when forming zextload
DAGCombiner was counting all uses of a load node  when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.

rdar://problem/13896307

llvm-svn: 185419
2013-07-02 09:58:53 +00:00
Elena Demikhovsky fed077be03 Fixed a comment.
llvm-svn: 184933
2013-06-26 12:15:53 +00:00
Elena Demikhovsky 6769c50d9e Optimized integer vector multiplication operation by replacing it with shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector.
llvm-svn: 184931
2013-06-26 10:55:03 +00:00
Michael Liao 62ebfd8786 Fix PR16360
When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.

llvm-svn: 184575
2013-06-21 18:45:27 +00:00
Stephen Lin 605207fe75 SelectionDAG: slightly refactor DAGCombiner::visitSELECT_CC to avoid redudant checks...
This doesn't really effect performance due to all the relevant calls being transparent but is clearer. 

llvm-svn: 184027
2013-06-15 04:03:33 +00:00
Matt Arsenault d2f0332a29 Introduce getSelect usage and use more getSelectCC
llvm-svn: 184012
2013-06-14 22:04:37 +00:00
Stephen Lin 4e69d01b67 SelectionDAG: minor fix to order of operands in comments to match the code
llvm-svn: 184008
2013-06-14 21:33:58 +00:00
Stephen Lin e31f2d2d54 SelectionDAG: Fix incorrect condition checks in some cases of folding FADD/FMUL combinations; also improve accuracy of comments
llvm-svn: 183993
2013-06-14 18:17:35 +00:00
Andrew Trick ef9de2a739 Track IR ordering of SelectionDAG nodes 2/4.
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.

llvm-svn: 182703
2013-05-25 02:42:55 +00:00
Michael J. Spencer df1ecbd734 Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.
llvm-svn: 182680
2013-05-24 22:23:49 +00:00
Matt Arsenault 75865923c9 Add LLVMContext argument to getSetCCResultType
llvm-svn: 182180
2013-05-18 00:21:46 +00:00
Matt Arsenault 04126234e5 Replace redundant code
Use EVT::changeExtendedVectorElementTypeToInteger instead of doing the
same thing that it does

llvm-svn: 182165
2013-05-17 21:43:43 +00:00
Bob Wilson c5c0823724 Remove redundant variable introduced by r181682.
llvm-svn: 181721
2013-05-13 19:02:31 +00:00
Hao Liu bc60196951 Fix PR15950 A bug in DAG Combiner about undef mask
llvm-svn: 181682
2013-05-13 02:07:05 +00:00
Benjamin Kramer a5d59333b3 DAGCombiner: Generate a correct constant for vector types when folding (xor (and)) into (and (not)).
PR15948.

llvm-svn: 181597
2013-05-10 14:09:52 +00:00
David Majnemer 386ab7f872 DAGCombiner: Simplify inverted bit tests
Fold (xor (and x, y), y) -> (and (not x), y)

This removes an opportunity for a constant to appear twice.

llvm-svn: 181395
2013-05-08 06:44:42 +00:00
Michael Kuperstein ac868757d0 Fix slightly too aggressive conact_vector optimization.
(Would sometimes optimize away conacts used to extend a vector with undef values)

llvm-svn: 181186
2013-05-06 08:06:13 +00:00
Nadav Rotem e5a2dda372 Optimize away nop CONCAT_VECTOR nodes.
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.

rdar://13402653
PR15866

llvm-svn: 180871
2013-05-01 19:18:51 +00:00
Silviu Baranga af7e8c367f Re-write the address propagation code for pre-indexed loads/stores to take into account some previously misssed cases (PRE_DEC addressing mode, the offset and base address are swapped, etc). This should fix PR15581.
llvm-svn: 180609
2013-04-26 15:52:24 +00:00
Benjamin Kramer d56ffc709d DAGCombiner: Canonicalize vector integer abs in the same way we do it for scalars.
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).

llvm-svn: 180597
2013-04-26 09:19:19 +00:00
Owen Anderson 2d4cca35c3 DAGCombine should not aggressively fold SEXT(VSETCC(...)) into a wider VSETCC without first checking the target's vector boolean contents.
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents.  The included change
fixes the PowerPC tests, and was OK'd by Hal.

llvm-svn: 180129
2013-04-23 18:09:28 +00:00
Tim Northover a2b533906a Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.
llvm-svn: 179939
2013-04-20 12:32:17 +00:00
Benjamin Kramer bbae991db6 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Arnold Schwaighofer d6c6e868b2 DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Arnold Schwaighofer 6752366ed7 Merge load/store sequences with adresses: base + index + offset
We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483
2013-04-01 18:12:58 +00:00
Benjamin Kramer 9335443236 DAGCombine: visitXOR can replace a node without returning it, bail out in that case.
Fixes the crash reported in PR15608.

llvm-svn: 178429
2013-03-30 21:28:18 +00:00
Michael Liao bb05a1d7b5 Enhance folding of (extract_subvec (insert_subvec V1, V2, IIdx), EIdx)
- Handle the case where the result of 'insert_subvect' is bitcasted
  before 'extract_subvec'. This removes the redundant insertf128/extractf128
  pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.

llvm-svn: 177945
2013-03-25 23:47:35 +00:00
Shuxin Yang 93b1f12ac1 Disable some unsafe-fp-math DAG-combine transformation after legalization.
For instance, following transformation will be disabled:
    x + x + x => 3.0f * x;

The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.

Reviewed by Nadav, thanks a lot!

rdar://13445387

llvm-svn: 177933
2013-03-25 22:52:29 +00:00
Richard Relph 61046a9727 Avoid generating ISD::SELECT for vector operands to SIGN_EXTEND
llvm-svn: 176881
2013-03-12 18:17:18 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Benjamin Kramer 3238dc0c61 DAGCombiner: Make the post-legalize vector op optimization more aggressive.
A legal BUILD_VECTOR goes in and gets constant folded into another legal
BUILD_VECTOR so we don't lose any legality here. The problematic PPC
optimization that made this check necessary was fixed recently.

llvm-svn: 175759
2013-02-21 15:24:35 +00:00
Arnold Schwaighofer 3f9568e921 DAGCombiner: Fold pointless truncate, bitcast, buildvector series
(2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y)))
can be folded into a (2xi32) (buildvector i32 a, i32 b).

Such a DAG would cause uneccessary vdup instructions followed by vmovn
instructions.

We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in
the vectorized version of the code below.

double A[N];
double B[N];

void test_double_compare_to_double() {
  int i;
  for(i=0;i<N;i++)
    A[i] = (double)(A[i] < B[i]);
}

radar://13191881

Fixes bug 15283.

llvm-svn: 175670
2013-02-20 21:33:32 +00:00
Nadav Rotem 495b1a43c1 Dont merge consecutive loads/stores into vectors when noimplicitfloat is used.
llvm-svn: 175190
2013-02-14 18:28:52 +00:00
Owen Anderson cc068993ee Add some legality checks for SETCC before introducing it in the DAG combiner post-operand legalization.
llvm-svn: 175149
2013-02-14 09:07:33 +00:00
Paul Redmond 288604ed0c PR14562 - Truncation of left shift became undef
DAGCombiner::ReduceLoadWidth was converting (trunc i32 (shl i64 v, 32))
into (shl i32 v, 32) into undef. To prevent this, check the shift count
against the final result size.

Patch by: Kevin Schoedel
Reviewed by: Nadav Rotem

llvm-svn: 174972
2013-02-12 15:21:21 +00:00
Pete Cooper 10a3ae7039 Check type for legality before forming a select from loads.
Sorry for the lack of a test case.  I tried writing one for i386 as i know selects are illegal on this target, but they are actually considered legal by isel and expanded later.

I can't see any targets to trigger this, but checking for the legality of a node before forming it is general goodness.

llvm-svn: 174934
2013-02-12 03:14:50 +00:00
Hal Finkel 2581905f81 DAGCombiner: Constant folding around pre-increment loads/stores
Previously, even when a pre-increment load or store was generated,
we often needed to keep a copy of the original base register for use
with other offsets. If all of these offsets are constants (including
the offset which was combined into the addressing mode), then this is
clearly unnecessary. This change adjusts these other offsets to use the
new incremented address.

llvm-svn: 174746
2013-02-08 21:35:47 +00:00
Owen Anderson de89ecf1fc Reapply r174343, with a fix for a scary DAG combine bug where it failed to differentiate between the alignment of the
base point of a load, and the overall alignment of the load.  This caused infinite loops in DAG combine with the
original application of this patch.

ORIGINAL COMMIT LOG:
When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment.  However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.

This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.

llvm-svn: 174431
2013-02-05 19:24:39 +00:00
NAKAMURA Takumi 3753b28cd2 Revert r174343, "When the target-independent DAGCombiner inferred a higher alignment for a load,"
It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts.

llvm-svn: 174374
2013-02-05 14:44:16 +00:00
Owen Anderson a47fdbb032 When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment.  However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.

This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.

llvm-svn: 174343
2013-02-05 06:25:30 +00:00
Shuxin Yang cadd8a068e rdar://13126763
Fix a bug in DAGCombine. The symptom is mistakenly optimizing expression
"x + x*x" into "x * 3.0".

llvm-svn: 174239
2013-02-02 00:22:03 +00:00
Nadav Rotem 9450fcfff1 Revert 172708.
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.

llvm-svn: 172968
2013-01-20 08:35:56 +00:00
Elena Demikhovsky f6a30e05d5 Optimization for the following SIGN_EXTEND pairs:
v8i8  -> v8i64, 
v8i8  -> v8i32, 
v4i8  -> v4i64, 
v4i16 -> v4i64 
for AVX and AVX2.

Bug 14865.

llvm-svn: 172708
2013-01-17 09:59:53 +00:00
Bill Schmidt d006c6938b This patch addresses an incorrect transformation in the DAG combiner.
The included test case is derived from one of the GCC compatibility tests.
The problem arises after the selection DAG has been converted to type-legalized
form.  The combiner first sees a 64-bit load that can be converted into a
pre-increment form.  The original load feeds into a SRL that isolates the
upper 32 bits of the loaded doubleword.  This looks like an opportunity for
DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load.

However, this transformation is not valid, as the replacement load is not
a pre-increment load.  The pre-increment load produces an extra result,
which feeds a subsequent add instruction.  The replacement load only has
one result value, and this value is propagated to all uses of the pre-
increment load, including the add.  Because the add is looking for the
second result value as its operand, it ends up attempting to add a constant
to a token chain, resulting in a crash.

So the patch simply disables this transformation for any load with more than
two result values.

llvm-svn: 172480
2013-01-14 22:04:38 +00:00
Evan Cheng 5652a8df32 Fix a DAG combine bug visitBRCOND() is transforming br(xor(x, y)) to br(x != y).
It cahced XOR's operands before calling visitXOR() but failed to update the
operands when visitXOR changed the XOR node.

rdar://12968664

llvm-svn: 171999
2013-01-09 20:56:40 +00:00
Chandler Carruth 95f83e0155 Sink AddrMode back into TargetLowering, removing one of the most
peculiar headers under include/llvm.

This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.

llvm-svn: 171739
2013-01-07 15:14:13 +00:00
Tom Stellard 567f886eb0 DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes
DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two
mistakes:

1. It was checking the legality of scalar INT_TO_FP nodes and then generating
vector nodes.

2. It was passing the result value type to
TargetLoweringInfo::getOperationAction() when it should have been
passing the value type of the first operand.

llvm-svn: 171420
2013-01-02 22:13:01 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Nadav Rotem b1dd52450e Refactor DAGCombinerInfo. Change the different booleans that indicate if we are before or after different runs of DAGCo, with the CombineLevel enum.
Also, added a new API for checking if we are running before or after the LegalizeVectorOps phase. 

llvm-svn: 171142
2012-12-27 06:47:41 +00:00
Bob Wilson 3365b80290 Do not introduce vector operations in functions marked with noimplicitfloat.
<rdar://problem/12879313>

llvm-svn: 170630
2012-12-20 01:36:20 +00:00
Patrik Hagglund ffd057a3e1 Change TargetLowering::isCondCodeLegal to take an MVT, instead of EVT.
llvm-svn: 170524
2012-12-19 10:19:55 +00:00
Elena Demikhovsky 14a4af0e66 Optimized load + SIGN_EXTEND patterns in the X86 backend.
llvm-svn: 170506
2012-12-19 07:50:20 +00:00
Evan Cheng bf0baa9de7 Fix a bug in DAGCombiner::MatchBSwapHWord. Make sure the node has operands before referencing them. rdar://12868039
llvm-svn: 170078
2012-12-13 01:34:32 +00:00
Manman Ren 82751a105c DAGCombine: clamp hi bit in APInt::getBitsSet to avoid assertion
rdar://12838504

llvm-svn: 169951
2012-12-12 01:13:50 +00:00
Patrik Hagglund e98b7a0389 Revert EVT->MVT changes, r169836-169851, due to buildbot failures.
llvm-svn: 169854
2012-12-11 11:14:33 +00:00
Patrik Hagglund a970281106 Change TargetLowering::isCondCodeLegal to take an MVT, instead of EVT.
llvm-svn: 169843
2012-12-11 09:51:27 +00:00
Chandler Carruth b27041c50b Fix a miscompile in the DAG combiner. Previously, we would incorrectly
try to reduce the width of this load, and would end up transforming:

  (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
  (truncate (zextload i32 <ptr+4> as i64) to i32)

We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:

  movswl 6(...),%eax

Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:

  movl 6(...), %eax

llvm-svn: 169802
2012-12-11 00:36:57 +00:00
Craig Topper d8005db486 Teach DAG combine to handle vector add/sub with vectors of all 0s.
llvm-svn: 169727
2012-12-10 08:12:29 +00:00
Craig Topper 5ea3bdd75b Remove extra blank line.
llvm-svn: 169692
2012-12-09 08:20:52 +00:00
Craig Topper a183ddb0fe Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined.
llvm-svn: 169684
2012-12-08 22:49:19 +00:00
Nadav Rotem ac450eb59e Fix a bug in the code that merges consecutive stores. Previously we did not
check if loads that happen in between stores alias with the first store in the
chain, only with the second store onwards.

llvm-svn: 169516
2012-12-06 17:34:13 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Nadav Rotem 1157e1410c Allow merging multiple store sequences on the same chain.
llvm-svn: 169111
2012-12-02 17:14:09 +00:00
Nadav Rotem 307d767177 When combining consecutive stores allow loads in between the stores, if the loads do not alias.
llvm-svn: 168832
2012-11-29 00:00:08 +00:00
Rafael Espindola c79532d101 Handle DAG CSE adding new uses during ReplaceAllUsesWith. Fixes PR14333.
llvm-svn: 167912
2012-11-14 05:08:56 +00:00
Owen Anderson 15fd6ac4ba Be careful not to optimize a SELECT_CC into a SETCC post-legalization if the SETCC node would be illegal.
llvm-svn: 167344
2012-11-03 00:17:26 +00:00
Owen Anderson b351c8d692 Add a few more simple fast-math constant propagations and cancellations.
llvm-svn: 167200
2012-11-01 02:00:53 +00:00
Ulrich Weigand 3abb34389d In various places throughout the code generator, there were special
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.

Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.

llvm-svn: 166958
2012-10-29 18:35:49 +00:00
Michael Liao 5922979e49 Teach DAG combine to fold (buildvec (Xint2fp x)) to (Xint2fp (buildvec x))
- If more than 1 elemennts are defined and target supports the vectorized
  conversion, use the vectorized one instead to reduce the strength on
  conversion operation.

llvm-svn: 166546
2012-10-24 04:14:18 +00:00
Jakub Staszak a6addc2741 Keep coding standard. Don't evaluate getNumOperands() every time.
llvm-svn: 166531
2012-10-24 00:38:25 +00:00
Michael Liao 6d106b7bfd Clean up code and put transformation on (build_vec (ext x)) into a helper func
llvm-svn: 166519
2012-10-23 23:06:52 +00:00
Michael Liao 2c2358036d Simplify condition checking as CONCAT assume all inputs of the same type.
llvm-svn: 166260
2012-10-19 03:17:00 +00:00
Nadav Rotem d5f8859672 In SimplifySelectOps we pulled two loads through a select node despite the fact that one was dependent on the other.
rdar://12513091

llvm-svn: 166196
2012-10-18 18:06:48 +00:00
Michael Liao 3ac8201ea4 Revert part of r166049 back and enable test case in r166125.
- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid
  when '...' are all 'undef's.
- r166125 relies on this transformation.

llvm-svn: 166155
2012-10-17 23:45:54 +00:00
Michael Liao c87d98dbc8 Revert r166049
- In general, it's unsafe for this transformation.

llvm-svn: 166135
2012-10-17 22:41:15 +00:00
Michael Liao 7a442c8031 Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i
- If the extracted vector has the same type of all vectored being concatenated
  together, it should be simplified directly into v_i, where i is the index of
  the element being extracted.

llvm-svn: 166125
2012-10-17 20:48:33 +00:00
Michael Liao 19006206a1 Teach DAG combine to fold (trunc (fptoXi x)) to (fptoXi x)
llvm-svn: 166049
2012-10-16 19:38:35 +00:00
Nadav Rotem 35315fea70 Refactor the AddrMode class out of TLI to its own header file.
This class is used by LSR and a number of places in the codegen.
This is the first step in de-coupling LSR from TLI, and creating
a new interface in between them.

llvm-svn: 165455
2012-10-08 23:06:34 +00:00
Micah Villmow cdfe20b97f Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Benjamin Kramer db5fb3bfe8 Remove unused but set variable flagged by GCC.
llvm-svn: 165331
2012-10-05 20:08:45 +00:00
Benjamin Kramer 62f7fb977c Simplify code, don't or a bool with an uint64_t.
No functionality change.

llvm-svn: 165321
2012-10-05 18:19:44 +00:00
Nadav Rotem b27777ff02 When merging connsecutive stores, use vectors to store the constant zero.
llvm-svn: 165267
2012-10-04 22:35:15 +00:00
Nadav Rotem ac92066b0c Fix a cycle in the DAG. In this code we replace multiple loads with a single load and
multiple stores with a single load. We create the wide loads and stores (and their chains)
before we remove the scalar loads and stores and fix the DAG chain. We attempted to merge
loads with a different chain. When that happened, the assumption that it is safe to RAUW
broke and a cycle was introduced.

llvm-svn: 165148
2012-10-03 19:30:31 +00:00
Nadav Rotem 7cbc12a41d A DAGCombine optimization for mergeing consecutive stores to memory. The optimization
is not profitable in many cases because modern processors perform multiple stores
in parallel and merging stores prior to merging requires extra work. We handle two main cases:

1. Store of multiple consecutive constants:
  q->a = 3;
  q->4 = 5;
In this case we store a single legal wide integer.

2. Store of multiple consecutive loads:
  int a = p->a;
  int b = p->b;
  q->a = a;
  q->b = b;
In this case we load/store either ilegal vector registers or legal wide integer registers.

llvm-svn: 165125
2012-10-03 16:11:15 +00:00
Nadav Rotem abbe665154 Revert r164910 because it causes failures to several phase2 builds.
llvm-svn: 164911
2012-09-30 07:17:56 +00:00
Nadav Rotem 45715b25f7 A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work.  We only handle these cases:

1. Consecutive stores where the values and consecutive loads. For example:
 int a = p->a;
 int b = p->b;
 q->a = a;
 q->b = b;

2. Consecutive stores where the values are constants. Foe example:
 q->a = 4;
 q->b = 5;

llvm-svn: 164910
2012-09-30 06:24:14 +00:00
Duncan Sands fb9d30dd64 Speculatively revert commit 164885 (nadav) in the hope of ressurecting a pile of
buildbots.  Original commit message:

A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work.  We only handle these cases:

1. Consecutive stores where the values and consecutive loads. For example:
  int a = p->a;
  int b = p->b;
  q->a = a;
  q->b = b;

2. Consecutive stores where the values are constants. Foe example:
  q->a = 4;
  q->b = 5;

llvm-svn: 164890
2012-09-29 10:25:35 +00:00
Craig Topper 5f9791fd2f Tidy up to match coding standards. Remove 'else' after 'return' and moving operators to end of preceding line. No functional change intended.
llvm-svn: 164887
2012-09-29 07:18:53 +00:00
Craig Topper 65161fa493 Replace a couple if/elses around similar calls with conditional operators on the varying arguments. No functional change.
llvm-svn: 164886
2012-09-29 06:54:22 +00:00
Nadav Rotem a2e7ea2f18 A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work.  We only handle these cases:

1. Consecutive stores where the values and consecutive loads. For example:
  int a = p->a;
  int b = p->b;
  q->a = a;
  q->b = b;

2. Consecutive stores where the values are constants. Foe example:
  q->a = 4;
  q->b = 5;

llvm-svn: 164885
2012-09-29 06:33:25 +00:00
Sylvestre Ledru 91ce36c986 Revert 'Fix a typo 'iff' => 'if''. iff is an abreviation of if and only if. See: http://en.wikipedia.org/wiki/If_and_only_if Commit 164767
llvm-svn: 164768
2012-09-27 10:14:43 +00:00
Sylvestre Ledru 721cffd53a Fix a typo 'iff' => 'if'
llvm-svn: 164767
2012-09-27 09:59:43 +00:00
Nadav Rotem 841c9a84d0 Fix 80-col violations.
llvm-svn: 164297
2012-09-20 08:53:31 +00:00
Nadav Rotem 24a822a5cb Fix a dagcombine optimization. The optimization attempts to optimize a bitcast of fneg to integers
by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.

Fix rdar://12281066 PR13813.

llvm-svn: 163802
2012-09-13 14:54:28 +00:00
Craig Topper 8238461211 Teach DAG combiner to constant fold FABS of a BUILD_VECTOR of ConstantFPs. Factor similar code out of FNEG DAG combiner.
llvm-svn: 163587
2012-09-11 01:45:21 +00:00
James Molloy 1e5c611815 Fix an assertion failure when optimising a shufflevector incorrectly into concat_vectors, and a followup bug with SelectionDAG::getNode() creating nodes with invalid types.
llvm-svn: 163511
2012-09-10 14:01:21 +00:00
Craig Topper 03f39773e0 Teach DAG combiner to constant fold fneg of a BUILD_VECTOR of constants.
llvm-svn: 163483
2012-09-09 22:58:45 +00:00
Roman Divacky 9338344acb Constify this properly. Found by gcc48 -Wcast-qual.
llvm-svn: 163256
2012-09-05 22:15:49 +00:00
Silviu Baranga 3f40d87207 Fixed the DAG combiner to better handle the folding of AND nodes for vector types. The previous code was making the assumption that the length of the bitmask returned by isConstantSplat was equal to the size of the vector type. Now we first make sure that the splat value has at least the length of the vector lane type, then we only use as many fields as we have available in the splat value.
llvm-svn: 163203
2012-09-05 08:57:21 +00:00
Owen Anderson 90e0eaffa8 Teach DAG combine a number of tricks to simplify FMA expressions in fast-math mode.
llvm-svn: 163051
2012-09-01 06:04:27 +00:00
Michael Liao ec385012ae Fix typo
llvm-svn: 163049
2012-09-01 04:09:16 +00:00
Owen Anderson cc61f87cf7 Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants.
llvm-svn: 162956
2012-08-30 23:35:16 +00:00
Stepan Dyatkovskiy 99120e04be Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here.
llvm-svn: 162354
2012-08-22 09:33:55 +00:00
Stepan Dyatkovskiy 6a638ec521 Fixed DAGCombiner bug (found and localized by James Malloy):
The DAGCombiner tries to optimise a BUILD_VECTOR by checking if it
consists purely of get_vector_elts from one or two source vectors. If
so, it either makes a concat_vectors node or a shufflevector node.

However, it doesn't check the element type width of the underlying
vector, so if you have this sequence:

Node0: v4i16 = ...
Node1: i32 = extract_vector_elt Node0
Node2: i32 = extract_vector_elt Node0
Node3: v16i8 = BUILD_VECTOR Node1, Node2, ...

It will attempt to:

Node0:    v4i16 = ...
NewNode1: v16i8 = concat_vectors Node0, ...

Where this is actually invalid because the element width is completely
different. This causes an assertion failure on DAG legalization stage.

Fix:
If output item type of BUILD_VECTOR differs from input item type.
Make concat_vectors based on input element type and then bitcast it to the output vector type. So the case described above will transformed to:
Node0:    v4i16 = ...
NewNode1: v8i16 = concat_vectors Node0, ...
NewNode2: v16i8 = bitcast NewNode1

llvm-svn: 162195
2012-08-20 07:57:06 +00:00
Owen Anderson a40319b7f1 Add a roundToIntegral method to APFloat, which can be parameterized over various rounding modes. Use this to implement SelectionDAG constant folding of FFLOOR, FCEIL, and FTRUNC.
llvm-svn: 161807
2012-08-13 23:32:49 +00:00
Elena Demikhovsky 3cb3b0045c Added FMA functionality to X86 target.
llvm-svn: 161110
2012-08-01 12:06:00 +00:00
Nadav Rotem 9056076cab Fixed DAGCombine optimizations which generate select_cc for targets
that do not support it (X86 does not lower select_cc).

PR: 13428

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160619
2012-07-23 07:59:50 +00:00
Bill Wendling d163405df8 Remove tabs.
llvm-svn: 160475
2012-07-19 00:04:14 +00:00
Evan Cheng e6a3b03ee0 Back out r160101 and instead implement a dag combine to recover from instcombine transformation.
llvm-svn: 160387
2012-07-17 18:54:11 +00:00
Nadav Rotem a62368c965 Refactor the code that checks that all operands of a node are UNDEFs.
Add a micro-optimization to getNode of CONCAT_VECTORS when both operands are undefs.
Can't find a testcase for this because VECTOR_SHUFFLE already handles undef operands, but Duncan suggested that we add this.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160229
2012-07-15 08:38:23 +00:00
Nadav Rotem 018921002e Add a dagcombine optimization to convert concat_vectors of undefs into a single undef.
The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node.

llvm-svn: 160221
2012-07-14 21:30:27 +00:00
Owen Anderson b8844d6744 Only apply the SETCC+SITOFP -> SELECTCC optimization when the SETCC returns an MVT::i1, i.e. before type legalization.
This is a speculative fix for a problem on Mips reported by Akira Hatanaka.

llvm-svn: 160036
2012-07-11 06:38:55 +00:00
Nadav Rotem d908ddc186 Improve the loading of load-anyext vectors by allowing the codegen to load
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.

llvm-svn: 159991
2012-07-10 13:25:08 +00:00
Owen Anderson d4b841f8f9 Teach the DAG combiner to turn sitofp/uitofp from i1 into a conditional move, since there are only two possible values.
Previously, this would become an integer extension operation, followed by a real integer->float conversion.

llvm-svn: 159957
2012-07-09 20:31:12 +00:00
Evan Cheng 4c6f917d34 Make sure type is not extended or untyped before create a constant of the type. No test case. Found by inspection.
llvm-svn: 159179
2012-06-26 01:19:33 +00:00
Lang Hames b8650f106a Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).

This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.

Fast mode - allows formation of fused FP ops whenever they're profitable.

Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.

Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.

Note: This option only controls formation of fused ops by the optimizers.  Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.

Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.

llvm-svn: 158956
2012-06-22 01:09:09 +00:00
Pete Cooper 5b61422d80 Fix potential crash if DAGCombine on stores sees a half type
llvm-svn: 158927
2012-06-21 18:00:39 +00:00
Pete Cooper fe5b84b404 Add users of a MERGE_VALUE node to the worklist to process again when the node is removed. Sorry, no test case. Foudn it by inspection of the code
llvm-svn: 158839
2012-06-20 19:35:43 +00:00
Hal Finkel 8a31138521 Fix DAGCombine to deal with ext-conversion of pre/post_inc loads.
The test case for this will come with the PPC indexed preinc loads commit.

llvm-svn: 158822
2012-06-20 15:42:48 +00:00
Lang Hames 39fb1d08dc Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Lang Hames a33db65bd9 Make comment slightly more helpful.
llvm-svn: 158467
2012-06-14 20:37:15 +00:00
Owen Anderson 0eda3e1de6 Switch the canonical FMA term operand order to match both the comment I wrote and the usual LLVM convention.
llvm-svn: 157708
2012-05-30 18:54:50 +00:00
Owen Anderson c7aaf523e1 Teach DAGCombine to canonicalize the position of a constant in the term operands of an FMA node.
llvm-svn: 157707
2012-05-30 18:50:39 +00:00
Jim Grosbach 92f6adc8be DAGCombiner should not change the type of an extract_vector index.
When a combine twiddles an extract_vector, care should be take to preserve
the type of the index operand. No luck extracting a reasonable testcase,
unfortunately.

rdar://11391009

llvm-svn: 156419
2012-05-08 20:56:07 +00:00
Owen Anderson ab63d84252 Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.
llvm-svn: 156324
2012-05-07 20:51:25 +00:00
Owen Anderson 41b0665b5b Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs.
llvm-svn: 156029
2012-05-02 22:17:40 +00:00
Owen Anderson b5f167c660 Teach DAG combine that multiplication by 1.0 can always be constant folded.
llvm-svn: 156023
2012-05-02 21:32:35 +00:00
Elena Demikhovsky 8d7e56c409 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Jakob Stoklund Olesen beb9469d5c Register DAGUpdateListeners with SelectionDAG.
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.

This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.

DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.

llvm-svn: 155248
2012-04-20 22:08:46 +00:00
Hal Finkel e0cf6397fd Remove dead SD nodes after the combining pass. Fixes PR12201.
llvm-svn: 154786
2012-04-16 03:33:22 +00:00
Nadav Rotem 9d376b6578 Reapply 154397. Original message:
Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.

llvm-svn: 154490
2012-04-11 08:26:11 +00:00
Duncan Sands 4f53074cca Add a comment noting that the fdiv -> fmul conversion won't generate
multiplication by a denormal, and some tests checking that.

llvm-svn: 154431
2012-04-10 20:35:27 +00:00
Owen Anderson 3efc8f22bd Revert r154397, which was causing make check failures on the buildbots.
llvm-svn: 154414
2012-04-10 18:02:12 +00:00
Nadav Rotem 065564d85a Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.

llvm-svn: 154397
2012-04-10 14:58:31 +00:00
Anton Korobeynikov 4d1220de34 Transform div to mul with reciprocal only when fp imm is legal.
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)

llvm-svn: 154394
2012-04-10 13:22:49 +00:00
Rafael Espindola 1d9672bdce Don't try to zExt just to check if an integer constant is zero, it might
not fit in a i64.

llvm-svn: 154364
2012-04-10 00:16:22 +00:00
Rafael Espindola 8f62b3248e Pattern match a setcc of boolean value with 0 as a truncate.
llvm-svn: 154322
2012-04-09 16:06:03 +00:00
Craig Topper 9c3da316ec Remove unnecessary type check when combining and/or/xor of swizzles. Move some checks to allow better early out.
llvm-svn: 154309
2012-04-09 07:19:09 +00:00
Craig Topper e5893f64e8 Remove unnecessary 'else' on an 'if' that always returns
llvm-svn: 154308
2012-04-09 05:59:53 +00:00
Craig Topper e3ad4834ae Optimize code slightly. No functionality change.
llvm-svn: 154307
2012-04-09 05:55:33 +00:00
Craig Topper 5894fe430a Replace some explicit checks with asserts for conditions that should never happen.
llvm-svn: 154305
2012-04-09 05:16:56 +00:00
Benjamin Kramer bb6ff08766 Silence sign-compare warning.
llvm-svn: 154297
2012-04-08 19:04:45 +00:00
Duncan Sands 2f1dc3814b Only have codegen turn fdiv by a constant into fmul by the reciprocal
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296
2012-04-08 18:08:12 +00:00
Nadav Rotem 71d07ae5cb 1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new
shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266
2012-04-07 21:19:08 +00:00
Duncan Sands 5f8397a934 Convert floating point division by a constant into multiplication by the
reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265
2012-04-07 20:04:00 +00:00
Rafael Espindola ba0a6cabb8 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Owen Anderson 98f2c0c384 Add predicates for checking whether targets have free FNEG and FABS operations, and prevent the DAGCombiner from turning them into bitwise operations if they do.
llvm-svn: 153901
2012-04-02 22:10:29 +00:00
Nadav Rotem 702f080767 Optimizing swizzles of complex shuffles may generate additional complex shuffles.
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.

llvm-svn: 153864
2012-04-02 07:11:12 +00:00
Nadav Rotem b078350872 This commit contains a few changes that had to go in together.
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Chris Lattner 1cc25e8a40 fix what looks like a real logic bug, found by PVS-Studio (part of PR12357)
llvm-svn: 153513
2012-03-27 16:27:21 +00:00
Craig Topper aaeae98936 When combining (vextract shuffle (load ), <1,u,u,u>), 0) -> (load ), add users of the final load to the worklist too. Needed by changes I'm preparing to make to X86 backend.
llvm-svn: 153078
2012-03-20 05:28:39 +00:00
Duncan Sands 3fb2fc6edb Fix DAG combine which creates illegal vector shuffles. Patch by Heikki Kultala.
llvm-svn: 153035
2012-03-19 15:35:44 +00:00
Nadav Rotem 6fd1d32c63 When optimizing certain BUILD_VECTOR nodes into other BUILD_VECTOR nodes, add the new node into the work list because there is a potential for further optimizations.
llvm-svn: 152784
2012-03-15 08:49:06 +00:00
Bill Wendling df170db2f6 Add a xform to the DAG combiner.
Transform:

        (fsub x, (fadd x, y)) -> (fneg y) and
        (fsub x, (fadd y, x)) -> (fneg y)

if 'unsafe math' is specified.
<rdar://problem/7540295>

llvm-svn: 152777
2012-03-15 05:12:00 +00:00
Evan Cheng d5f8e5766c Fortify r152675 a bit. Although I'm not able to come up with a test case that would trigger the truncation case.
llvm-svn: 152678
2012-03-13 22:16:11 +00:00
Evan Cheng 7bf83096df DAG combine incorrectly optimize (i32 vextract (v4i16 load $addr), c) to
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).

rdar://11035895

llvm-svn: 152675
2012-03-13 22:00:52 +00:00
Benjamin Kramer e1e549d617 Give dagcombiner's worklist some inline capacity.
llvm-svn: 152454
2012-03-10 00:23:58 +00:00
Evan Cheng 80893ce5f5 Extend r148086 to check for [r +/- reg] address mode. This fixes queens performance regression (due to increased register pressure from overly aggressive pre-inc formation).
llvm-svn: 152162
2012-03-06 23:33:32 +00:00
Owen Anderson 2ee7c4dfc5 Make it possible for a target to mark FSUB as Expand. This requires providing a default expansion (FADD+FNEG), and teaching DAGCombine not to form FSUBs post-legalize if they are not legal.
llvm-svn: 152079
2012-03-06 00:29:31 +00:00
James Molloy 862fe49c55 Teach the DAGCombiner that certain loadext nodes followed by ANDs can be converted to zeroexts.
llvm-svn: 150957
2012-02-20 12:02:38 +00:00
James Molloy 920ae8c642 Remove extraneous #include and spelling mistake introduced in r150669.
llvm-svn: 150670
2012-02-16 09:48:07 +00:00
James Molloy 67b6b11b52 Modify the algorithm when traversing the DAGCombiner's worklist to be O(log N) for all operations. This fixes a horrible worst case with lots of nodes where 99% of the time was being spent in std::remove.
llvm-svn: 150669
2012-02-16 09:17:04 +00:00
Nadav Rotem 0c65064dbe Fix a bug in DAGCombine for the optimization of BUILD_VECTOR. We cant generate a shuffle node from two vectors of different types.
llvm-svn: 150383
2012-02-13 12:42:26 +00:00
Nadav Rotem 34ca89afa8 This patch addresses the problem of poor code generation for the zext
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.

In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.

This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.

llvm-svn: 150340
2012-02-12 15:05:31 +00:00
Nadav Rotem 4f4546b73a Add additional documentation to the extract-and-trunc dagcombine optimization.
llvm-svn: 149823
2012-02-05 11:39:23 +00:00
Nadav Rotem 5399f4d6bf The type-legalizer often scalarizes code. One of the common patterns is extract-and-truncate.
In this patch we optimize this pattern and convert the sequence into extract op of a narrow type.
This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases.

llvm-svn: 149692
2012-02-03 13:18:25 +00:00
Nadav Rotem fb6ddee0e9 Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.
llvm-svn: 148337
2012-01-17 21:44:01 +00:00
Craig Topper 02cb0fb136 Teach DAG combiner to turn a BUILD_VECTOR of UNDEFs into an UNDEF of vector type.
llvm-svn: 148297
2012-01-17 09:09:48 +00:00
Benjamin Kramer 5a377e28da DAGCombiner: Deduplicate code.
llvm-svn: 148217
2012-01-15 11:50:43 +00:00
Evan Cheng fa8326334b DAGCombine's logic for forming pre- and post- indexed loads / stores were being
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
  ldr r0, [r2]
  ldr r1, [r2, #4]
=>
  ldr r0, [r2], #4
  ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.

rdar://10674430

llvm-svn: 148086
2012-01-13 01:37:24 +00:00
Chandler Carruth 55b2cdee26 Teach the X86 instruction selection to do some heroic transforms to
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

llvm-svn: 147936
2012-01-11 08:41:08 +00:00
Craig Topper 0515cd41e4 Replace some uses of hasNUsesOfValue(0, X) with !hasAnyUseOfValue(X)
llvm-svn: 147733
2012-01-07 18:31:09 +00:00
Craig Topper 43a1bd6ac7 Add some DAG combines for SUBC/SUBE. If nothing uses the carry/borrow out of subc, turn it into a sub. Turn (subc x, x) into 0 with no borrow. Turn (subc x, 0) into x with no borrow. Turn (subc -1, x) into (xor x, -1) with no borrow. Turn sube with no borrow in into subc.
llvm-svn: 147728
2012-01-07 09:06:39 +00:00
Chandler Carruth e041a30bb9 Prevent a DAGCombine from firing where there are two uses of
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.

llvm-svn: 147604
2012-01-05 11:05:55 +00:00
Craig Topper 279c77b677 Implement VECTOR_SHUFFLE canonicalizations during DAG combine.
llvm-svn: 147525
2012-01-04 08:07:43 +00:00
Eli Friedman e96286cdf2 Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2.
llvm-svn: 147283
2011-12-26 22:49:32 +00:00
Chandler Carruth 637cc6a8aa Initial CodeGen support for CTTZ/CTLZ where a zero input produces an
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.

Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.

Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.

llvm-svn: 146466
2011-12-13 01:56:10 +00:00
Eli Friedman f9081a8afe Zap unnecessary isIntDivCheap() check. PR11485. No testcase because this doesn't affect any in-tree target.
llvm-svn: 146015
2011-12-07 03:55:52 +00:00
Eli Friedman 0e58cba286 Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves correctly. PR11494.
llvm-svn: 145996
2011-12-07 00:11:56 +00:00
Nick Lewycky 50f02cb21b Move global variables in TargetMachine into new TargetOptions class. As an API
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.

One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.

llvm-svn: 145714
2011-12-02 22:16:29 +00:00
Evan Cheng 4a5b2040e2 Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621

llvm-svn: 145300
2011-11-28 22:37:34 +00:00
Evan Cheng a4b6404cf0 DAG combine should not increase alignment of loads / stores with alignment less
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431

llvm-svn: 145273
2011-11-28 20:42:56 +00:00
Eli Friedman ff1eaa7578 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
llvm-svn: 144863
2011-11-16 23:50:22 +00:00
Jay Foad 70679df664 Remove some unnecessary includes of PseudoSourceValue.h.
llvm-svn: 144634
2011-11-15 07:50:46 +00:00
Eli Friedman 9d448e4a42 Don't try to form pre/post-indexed loads/stores until after LegalizeDAG runs. Fixes PR11029.
llvm-svn: 144438
2011-11-12 00:35:34 +00:00
Lang Hames b85fcd07df Lower mem-ops to unaligned i32/i16 load/stores on ARM where supported.
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.

llvm-svn: 144102
2011-11-08 18:56:23 +00:00
Pete Cooper 82cd9e81fc Added invariant field to the DAG.getLoad method and changed all calls.
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses

llvm-svn: 144100
2011-11-08 18:42:53 +00:00
Richard Osborne 561fac4d4e Don't introduce custom nodes after legalization in TargetLowering::BuildSDIV()
and TargetLowering::BuildUDIV(). Fixes PR11283

llvm-svn: 143964
2011-11-07 17:09:05 +00:00
Nadav Rotem f310361a7d Cleanup. Document. Make sure that this build_vector optimization only runs before the op legalizer and that the used type is legal.
llvm-svn: 143358
2011-10-31 20:08:25 +00:00
Benjamin Kramer a4eba41b7a Silence compiler warning.
llvm-svn: 143308
2011-10-30 08:39:55 +00:00
Nadav Rotem bf6568b5d6 Add a new DAGCombine optimization for BUILD_VECTOR.
If all of the inputs are zero/any_extended, create a new simple BV
which can be further optimized by other BV optimizations.

llvm-svn: 143297
2011-10-29 21:23:04 +00:00
Eli Friedman e9e356ad6b Don't crash on 128-bit sdiv by constant. Found by inspection.
llvm-svn: 143095
2011-10-27 02:06:39 +00:00
Eli Friedman 3e9ef907e0 Remove a couple redundant checks.
llvm-svn: 142959
2011-10-25 20:34:22 +00:00
Bob Wilson 681561901d Fix a DAG combiner assertion failure when constant folding BUILD_VECTORS.
svn r139159 caused SelectionDAG::getConstant() to promote BUILD_VECTOR operands
with illegal types, even before type legalization.  For this testcase, that led
to one BUILD_VECTOR with i16 operands and another with promoted i32 operands,
which triggered the assertion.

llvm-svn: 142370
2011-10-18 17:34:47 +00:00
Dan Gohman e83e1b2d2c Fix SimplifySelectCC to add newly created nodes to the DAGCombiner
worklist, as it may be possible to perform further optimization on them.

llvm-svn: 140349
2011-09-22 23:01:29 +00:00
Bruno Cardoso Lopes 6cb23f6e7f Add a DAGCombine for subvector extracts to remove useless chains of
subvector inserts and extracts. Initial patch by Rackover, Zvi with
some tweak done by me.

llvm-svn: 140204
2011-09-20 23:19:33 +00:00
Eli Friedman b7910b79f5 Make the SelectionDAG verify that all the operands of BUILD_VECTOR have the same type. Teach DAGCombiner::visitINSERT_VECTOR_ELT not to make invalid BUILD_VECTORs. Fixes PR10897.
llvm-svn: 139407
2011-09-09 21:04:06 +00:00
Duncan Sands f2641e1bc1 Add codegen support for vector select (in the IR this means a select
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons.  Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all").  Patch mostly by
Nadav Rotem.

llvm-svn: 139159
2011-09-06 19:07:46 +00:00
Benjamin Kramer 68ed46ce9a Roll back the rest of r126557. It's a hack that will break in some obscure cases.
llvm-svn: 138130
2011-08-19 22:39:31 +00:00
Nadav Rotem 62da15a330 Revert r137310 because it does not optimize any code on ToT
llvm-svn: 137466
2011-08-12 17:15:04 +00:00
Nadav Rotem 61140e1028 [AVX] When joining two XMM registers into a YMM register, make sure that the
lower XMM register gets in first. This will allow the SUBREG pattern to
elliminate the first vector insertion. 

llvm-svn: 137310
2011-08-11 16:49:36 +00:00
Eli Friedman cbd3ba91b7 Make sure this DAGCombine actually returns an UNDEF of the correct type; PR10476.
llvm-svn: 135993
2011-07-25 22:25:42 +00:00
Chris Lattner 229907cd11 land David Blaikie's patch to de-constify Type, with a few tweaks.
llvm-svn: 135375
2011-07-18 04:54:35 +00:00
Eric Christopher d6300d2956 Add a dag combine pattern for folding C2-(A+C1) -> (C2-C1)-A
Fixes rdar://9761830

llvm-svn: 135123
2011-07-14 01:12:15 +00:00
Lang Hames 5a00499e87 Add functions 'hasPredecessor' and 'hasPredecessorHelper' to SDNode. The
hasPredecessorHelper function allows predecessors to be cached to speed up
repeated invocations. This fixes PR10186.

X.isPredecessorOf(Y) now just calls Y.hasPredecessor(X)

Y.hasPredecessor(X) calls Y.hasPredecessorHelper(X, Visited, Worklist) with
empty Visited and Worklist sets (i.e. no caching over invocations).

Y.hasPredecessorHelper(X, Visited, Worklist) caches search state in Visited
and Worklist to speed up repeated calls. The Visited set is searched for X
before going to the worklist to further search the DAG if necessary.

llvm-svn: 134592
2011-07-07 04:31:51 +00:00
Benjamin Kramer 8665f8d916 Revert a part of r126557 which could create unschedulable DAGs.
llvm-svn: 134067
2011-06-29 13:47:25 +00:00
Jay Foad 83be361b8a Replace the existing forms of ConstantArray::get() with a single form
that takes an ArrayRef.

llvm-svn: 133615
2011-06-22 09:24:39 +00:00
Evan Cheng 4c0bd9629d Teach dag combine to match halfword byteswap patterns.
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
   => (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
   => (rotl (bswap x) 16)

This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.

rdar://9609108

llvm-svn: 133503
2011-06-21 06:01:08 +00:00
Nick Lewycky 6d677cfdd8 Add a DAGCombine for (ext (binop (load x), cst)).
llvm-svn: 133124
2011-06-16 01:15:49 +00:00
Nadav Rotem d2d9bdb2b0 Enable the simplification of truncating-store after fixing the usage of
GetDemandBits (which must operate on the vector element type).

Fix the a usage of getZeroExtendInReg which must also be done on scalar types.

llvm-svn: 133052
2011-06-15 11:19:12 +00:00
Chad Rosier 818e116723 When pattern matching during instruction selection make sure shl x,1 is not
converted to add x,x if x is a undef.  add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.

llvm-svn: 133022
2011-06-14 22:29:10 +00:00
Nadav Rotem 571ae19af7 Disable trunc-store simplification on vectors.
llvm-svn: 132984
2011-06-14 07:18:26 +00:00
Eli Friedman 1877ac9937 Change this DAGCombine to build AND of SHR instead of SHR of AND; this matches the ordering we prefer in instcombine. Part of rdar://9562809.
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.

llvm-svn: 132809
2011-06-09 22:14:44 +00:00
Devang Patel efec7715ec Revert 121907 (it causes llc crash) and apply original patch from PR9817.
llvm-svn: 131926
2011-05-23 22:04:42 +00:00
Benjamin Kramer 2fd48f2730 Implement mulo x, 2 -> addo x, x in DAGCombiner.
llvm-svn: 131800
2011-05-21 18:31:55 +00:00
Dan Gohman 4298df6d86 Misc. code cleanups.
llvm-svn: 131495
2011-05-17 22:20:36 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Duncan Sands 6be291a2cd Indent properly, no functionality change.
llvm-svn: 131082
2011-05-09 08:03:33 +00:00
Eli Friedman 55b0acd624 PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext.
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.

llvm-svn: 129650
2011-04-16 23:25:34 +00:00
Owen Anderson a519284fec Fix another instance of the DAG combiner not using the correct type for the RHS of a shift.
llvm-svn: 129522
2011-04-14 17:30:49 +00:00
Chris Lattner 41c80e89f3 have dag combine zap "store undef", which can be formed during call lowering
with undef arguments.

llvm-svn: 129185
2011-04-09 02:32:02 +00:00
Cameron Zwarich 8c7bbc09e2 Add a RemoveFromWorklist method to DCI. This is needed to do some complicated
transformations in target-specific DAG combines without causing DAGCombiner to
delete the same node twice. If you know of a better way to avoid this (see my
next patch for an example), please let me know.

llvm-svn: 128758
2011-04-02 02:40:26 +00:00
Evan Cheng adb9c03e41 Avoid replacing the value of a directly stored load with the stored value if the load is indexed. rdar://9117613.
llvm-svn: 127440
2011-03-11 00:48:56 +00:00
Stuart Hastings 6b4007dec6 Can't introduce floating-point immediate constants after legalization.
Radar 9056407.

llvm-svn: 126864
2011-03-02 19:36:30 +00:00
Nadav Rotem b00913028f Fix typos in the comments.
llvm-svn: 126565
2011-02-27 07:40:43 +00:00
Benjamin Kramer 26691d9660 Add some DAGCombines for (adde 0, 0, glue), which are useful to optimize legalized code for large integer arithmetic.
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible

It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.

llvm-svn: 126557
2011-02-26 22:48:07 +00:00
Owen Anderson b2c80da4ae Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS.
llvm-svn: 126518
2011-02-25 21:41:48 +00:00
Nadav Rotem 502f1b943f Enable support for vector sext and trunc:
Limit the folding of any_ext and sext  into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.

Similar to commit 126080 (for enabling zext).

llvm-svn: 126424
2011-02-24 21:01:34 +00:00
Nadav Rotem 25f2ac948b Fix 9267; Add vector zext support.
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.

llvm-svn: 126080
2011-02-20 12:37:50 +00:00
Stuart Hastings 81c4306005 Swap VT and DebugLoc operands of getExtLoad() for consistency with
other getNode() methods.  Radar 9002173.

llvm-svn: 125665
2011-02-16 16:23:55 +00:00
Eric Christopher e5ca1e0506 Refactor zero folding slightly. Clean up todo.
llvm-svn: 125651
2011-02-16 04:50:12 +00:00
Eric Christopher ef72141a75 The change for PR9190 wasn't quite right. We need to avoid making the
transformation if we can't legally create a build vector of the correct
type. Check that we can make the transformation first, and add a TODO to
refactor this code with similar cases.

Fixes: PR9223 and rdar://9000350
llvm-svn: 125631
2011-02-16 01:10:03 +00:00
Chris Lattner e95d195014 Revisit my fix for PR9028: the issue is that DAGCombine was
generating i8 shift amounts for things like i1024 types.  Add
an assert in getNode to prevent this from occuring in the future,
fix the buggy transformation, revert my previous patch, and
document this gotcha in ISDOpcodes.h

llvm-svn: 125465
2011-02-13 19:09:16 +00:00
Nadav Rotem db2f54811d A fix for 9165.
The DAGCombiner created illegal BUILD_VECTOR operations.
The patch added a check that either illegal operations are
allowed or that the created operation is legal.

llvm-svn: 125435
2011-02-12 14:40:33 +00:00
Nadav Rotem a49a02a04f SimplifySelectOps can only handle selects with a scalar condition. Add a check
that the condition is not a vector.

llvm-svn: 125398
2011-02-11 19:57:47 +00:00
Nadav Rotem 18f6a33457 Fix #9190
The bug happens when the DAGCombiner attempts to optimize one of the patterns
of the SUB opcode. It tries to create a zero of type v2i64. This type is legal
on 32bit machines, but the initializer of this vector (i64) is target dependent.
Currently, the initializer attempts to create an i64 zero constant, which fails.
Added a flag to tell the DAGCombiner to create a legal zero, if we require that
the pass would generate legal types.

llvm-svn: 125391
2011-02-11 19:20:37 +00:00
Evan Cheng d42641c6b5 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252

llvm-svn: 124708
2011-02-02 01:06:55 +00:00
Richard Osborne 272e084bca Fix bug where ReduceLoadWidth was creating illegal ZEXTLOAD instructions.
llvm-svn: 124587
2011-01-31 17:41:44 +00:00
Benjamin Kramer 946e1522b6 Teach DAGCombine to fold fold (sra (trunc (sr x, c1)), c2) -> (trunc (sra x, c1+c2) when c1 equals the amount of bits that are truncated off.
This happens all the time when a smul is promoted to a larger type.

On x86-64 we now compile "int test(int x) { return x/10; }" into
  movslq  %edi, %rax
  imulq $1717986919, %rax, %rax
  movq  %rax, %rcx
  shrq  $63, %rcx
  sarq  $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
  addl  %ecx, %eax

This fires 96 times in gcc.c on x86-64.

llvm-svn: 124559
2011-01-30 16:38:43 +00:00
Benjamin Kramer 65bb14d368 Add the missing sub identity "A-(A-B) -> B" to DAGCombine.
This happens e.g. for code like "X - X%10" where we lower the modulo operation
to a series of multiplies and shifts that are then subtracted from X, leading to
this missed optimization.

llvm-svn: 124532
2011-01-29 12:34:05 +00:00
Anton Korobeynikov 2f93128109 Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there.
llvm-svn: 123170
2011-01-10 12:39:04 +00:00
Benjamin Kramer 1f4dfbbcb0 DAGCombine add (sext i1), X into sub X, (zext i1) if sext from i1 is illegal. The latter usually compiles into smaller code.
example code:
unsigned foo(unsigned x, unsigned y) {
  if (x != 0) y--;
  return y;
}

before:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    sbbl  %eax, %eax              ## encoding: [0x19,0xc0]
    notl  %eax                    ## encoding: [0xf7,0xd0]
    addl  8(%esp), %eax           ## encoding: [0x03,0x44,0x24,0x08]
    ret                           ## encoding: [0xc3]

after:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    movl  8(%esp), %eax           ## encoding: [0x8b,0x44,0x24,0x08]
    adcl  $-1, %eax               ## encoding: [0x83,0xd0,0xff]
    ret                           ## encoding: [0xc3]

llvm-svn: 122455
2010-12-22 23:17:45 +00:00
Chris Lattner cafc1e60bb Fix a bug in ReduceLoadWidth that wasn't handling extending
loads properly.  We miscompiled the testcase into:

_test:                                  ## @test
	movl	$128, (%rdi)
	movzbl	1(%rdi), %eax
	ret

Now we get a proper:

_test:                                  ## @test
	movl	$128, (%rdi)
	movsbl	(%rdi), %eax
	movzbl	%ah, %eax
	ret

This fixes PR8757.

llvm-svn: 122392
2010-12-22 08:02:57 +00:00
Chris Lattner 9a499e96eb more cleanups, move a check for "roundedness" earlier to reject
unhanded cases faster and simplify code.

llvm-svn: 122391
2010-12-22 08:01:44 +00:00
Chris Lattner 222374d886 reduce indentation and improve comments, no functionality change.
llvm-svn: 122389
2010-12-22 07:36:50 +00:00
Dale Johannesen a94e36bbee Reapply 122353-122355 with fixes. 122354 was wrong;
the shift type was needed one place, the shift count
type another.  The transform in 123555 had the same
problem.

llvm-svn: 122366
2010-12-21 21:55:50 +00:00
Dale Johannesen 87c47499c6 Revert 122353-122355 for the moment, they broke stuff.
llvm-svn: 122360
2010-12-21 21:22:27 +00:00
Dale Johannesen caf42aa6a4 Add a new transform to DAGCombiner.
llvm-svn: 122355
2010-12-21 20:10:51 +00:00
Dale Johannesen fa5dc82fda Get the type of a shift from the shift, not from its shift
count operand.  These should be the same but apparently are
not always, and this is cleaner anyway.  This improves the
code in an existing test.

llvm-svn: 122354
2010-12-21 20:06:19 +00:00
Dale Johannesen d64931df77 Shift by the word size is invalid IR; don't create it.
llvm-svn: 122353
2010-12-21 20:00:06 +00:00
Chris Lattner 2a7ff99979 fix some typos
llvm-svn: 122349
2010-12-21 18:05:22 +00:00
Chris Lattner 3e5fbd74ed rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for
something that just glues two nodes together, even if it is
sometimes used for flags.

llvm-svn: 122310
2010-12-21 02:38:05 +00:00
Dale Johannesen 0a291a36f2 Cosmetic changes.
llvm-svn: 122259
2010-12-20 20:10:50 +00:00
Bob Wilson 5408144add Fix a DAGCombiner crash when folding binary vector operations with constant
BUILD_VECTOR operands where the element type is not legal.  I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.

llvm-svn: 122102
2010-12-17 23:06:49 +00:00
Dale Johannesen cd538afa52 Add a transform to DAG Combiner. This improves the
code for the case where 32-bit divide by constant is
turned into 64-bit multiply by constant.  8771012.

llvm-svn: 122090
2010-12-17 21:45:49 +00:00
Chris Lattner 15090e1eb0 take care of some todos, transforming [us]mul_lohi into
a wider mul if the wider mul is legal.

llvm-svn: 121848
2010-12-15 06:04:19 +00:00
Chris Lattner b86dceea1b when transforming a MULHS into a wider MUL, there is no need to SRA the
result, the top bits are truncated off anyway, just use SRL.

llvm-svn: 121846
2010-12-15 05:51:39 +00:00
Chris Lattner 10bd29f1d4 Add a couple dag combines to transform mulhi/mullo into a wider multiply
when the wider type is legal.  This allows us to compile:

define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
	%div = udiv i16 %x, 33
	ret i16 %div
}

into:

test1:                                  # @test1
	movzwl	4(%esp), %eax
	imull	$63551, %eax, %eax      # imm = 0xF83F
	shrl	$21, %eax
	ret

instead of:

test1:                                  # @test1
        movw    $-1985, %ax             # imm = 0xFFFFFFFFFFFFF83F
        mulw    4(%esp)
        andl    $65504, %edx            # imm = 0xFFE0
        movl    %edx, %eax
        shrl    $5, %eax
        ret

Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320

We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.

llvm-svn: 121696
2010-12-13 08:39:01 +00:00
Eric Christopher d9e8eac235 80-col fixups.
llvm-svn: 121356
2010-12-09 04:48:06 +00:00
Jay Foad 583abbc4df PR5207: Change APInt methods trunc(), sext(), zext(), sextOrTrunc() and
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.

llvm-svn: 121120
2010-12-07 08:25:19 +00:00
Bob Wilson f9b96c474f Fix a comment typo.
llvm-svn: 120235
2010-11-28 06:51:19 +00:00
Wesley Peck 527da1b6e2 Renaming ISD::BIT_CONVERT to ISD::BITCAST to better reflect the LLVM IR concept.
llvm-svn: 119990
2010-11-23 03:31:01 +00:00
Duncan Sands c92331b984 Fix thinko: we must turn select(anyext, sext) into sext(select)
not anyext(select).  Spotted by Frits van Bommel.

llvm-svn: 119739
2010-11-18 21:16:28 +00:00
Duncan Sands 12f3b3b44f The DAGCombiner was threading select over pairs of extending loads even
if the extension types were not the same.  The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext).  Reported
and diagnosed by Sebastien Deldon.

llvm-svn: 119728
2010-11-18 20:05:18 +00:00
Dan Gohman 5db8921422 Fix DAGCombiner to avoid folding a sext-in-reg or similar through a shl
in order to fold it into a load.

llvm-svn: 118471
2010-11-09 01:54:35 +00:00
Eric Christopher c6418b105a Just return undef for invalid masks or elts, and since we're doing that,
just do it earlier too.

llvm-svn: 118195
2010-11-03 20:44:42 +00:00
Eric Christopher fcc9e6848a If we have an undef mask our Elt will be -1 for our access, handle
this by using an undef as a pointer.

Fixes rdar://8625016

llvm-svn: 118164
2010-11-03 09:36:40 +00:00
Dan Gohman 68fb004616 Fix DAGCombiner to avoid going into an infinite loop when it
encounters (and:i64 (shl:i64 (load:i64), 1), 0xffffffff).
This fixes rdar://8606584.

llvm-svn: 118143
2010-11-03 01:47:46 +00:00
Bob Wilson 08882be86c Remove DAG combiner patch to fold vector splats. Instcombiner does it now.
llvm-svn: 117720
2010-10-29 22:03:02 +00:00
Bob Wilson f63da12be9 Teach the DAG combiner to fold a splat of a splat. Radar 8597790.
Also do some minor refactoring to reduce indentation.

llvm-svn: 117558
2010-10-28 17:06:14 +00:00
Dan Gohman a94cc6dfe8 Make CodeGen TBAA-aware.
llvm-svn: 116890
2010-10-20 00:31:05 +00:00
Evan Cheng c8d6cfd730 This DAG combine BRCOND transformation can look pass truncate of the operand:
//   %a = ...                                                                                                                                                                                  
    //   %b = and i32 %a, 2                                                                                                                                                                        
    //   %c = srl i32 %b, 1                                                                                                                                                                        
    //   brcond i32 %c ...                                                                                                                                                                         
    //                                                                                                                                                                                             
    // into                                                                                                                                                                                        
    //                                                                                                                                                                                             
    //   %a = ...                                                                                                                                                                                  
    //   %b = and i32 %a, 2                                                                                                                                                                        
    //   %c = setcc eq %b, 0                                                                                                                                                                       
    //   brcond %c ...

Make sure it restores local variable N1, which corresponds to the condition operand if it fails to match.

This apparently breaks TCE but since that backend isn't in the tree I don't have a test for it.

llvm-svn: 115571
2010-10-04 22:41:01 +00:00
Chris Lattner a205055857 fix rdar://8494845 + PR8244 - a miscompile exposed by my patch in r101350
llvm-svn: 115294
2010-10-01 05:36:09 +00:00
Owen Anderson 3231d13ddd A select between a constant and zero, when fed by a bit test, can be efficiently
lowered using a series of shifts.
Fixes <rdar://problem/8285015>.

llvm-svn: 114599
2010-09-22 22:58:22 +00:00
Owen Anderson 5e65dfbb97 Reimplement r114460 in target-independent DAGCombine rather than target-dependent, by using
the predicate to discover the number of sign bits.  Enhance X86's target lowering to provide
a useful response to this query.

llvm-svn: 114473
2010-09-21 20:42:50 +00:00
Chris Lattner 676c61db0e update a bunch of code to use the MachinePointerInfo version of getStore.
llvm-svn: 114461
2010-09-21 18:41:36 +00:00
Chris Lattner 6963c1f789 eliminate an old SelectionDAG::getTruncStore method, propagating
MachinePointerInfo around more.

llvm-svn: 114452
2010-09-21 17:42:31 +00:00
Chris Lattner 3d178ed4d4 propagate MachinePointerInfo through various uses of the old
SelectionDAG::getExtLoad overload, and eliminate it.

llvm-svn: 114446
2010-09-21 17:04:51 +00:00
Chris Lattner f72c3c08a4 convert dagcombine off the old form of getLoad. This fixes several bugs
with SVOffset computation.

llvm-svn: 114442
2010-09-21 16:08:50 +00:00
Chris Lattner e32675253f simplify DAGCombiner::SimplifySelectOps step #2/2.
llvm-svn: 114437
2010-09-21 15:58:55 +00:00
Chris Lattner 254c445e63 substantially reduce indentation and simplify DAGCombiner::SimplifySelectOps.
no functionality change (step #1)

llvm-svn: 114436
2010-09-21 15:46:59 +00:00
Chris Lattner a35499e2af a few more trivial updates. This fixes PerformInsertVectorEltInMemory to not
pass a completely incorrect SrcValue, which would result in a miscompile with
combiner-aa.

llvm-svn: 114411
2010-09-21 07:32:19 +00:00
Owen Anderson 272ff94916 When TCO is turned on, it is possible to end up with aliasing FrameIndex's. Therefore,
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.

This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.

llvm-svn: 114348
2010-09-20 20:39:59 +00:00
Owen Anderson 7b8d2ae912 Revert r114312 while I sort out some issues.
llvm-svn: 114313
2010-09-19 21:01:26 +00:00
Owen Anderson ff82f8a35b Tentatively enabled DAGCombiner Alias Analysis by default. As far as I know,
r114268 fixed the last of the blockers to enabling it.  I will be monitoring
for failures.

llvm-svn: 114312
2010-09-19 19:51:55 +00:00
Dan Gohman 3c9b5f394b Don't narrow the load and store in a load+twiddle+store sequence unless
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.

This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.

llvm-svn: 112861
2010-09-02 21:18:42 +00:00
Nate Begeman 317b969ac5 Fix a crash in the dag combiner caused by ConstantFoldBIT_CONVERTofBUILD_VECTOR calling itself
recursively and returning a SCALAR_TO_VECTOR node, but assuming the input was always a BUILD_VECTOR.

llvm-svn: 109519
2010-07-27 18:02:18 +00:00
Owen Anderson 9c271e2835 Remove r108639 now that it is handled by InstCombine instead.
llvm-svn: 108688
2010-07-19 08:10:24 +00:00
Owen Anderson f7f9c8a2f7 Add a DAGCombine xform to fold away redundant float->double->float conversions around sqrt instructions.
I am assured by people more knowledgeable than me that there are no rounding issues in eliminating this.

This fixed <rdar://problem/8197504>.

llvm-svn: 108639
2010-07-18 08:47:54 +00:00
Duncan Sands 41b4a6b36a Convert some tab stops into spaces.
llvm-svn: 108130
2010-07-12 08:16:59 +00:00
Bob Wilson 21eed476e8 Reenable DAG combining for vector shuffles. It looks like it was temporarily
disabled and then never turned back on again.  Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.

llvm-svn: 107941
2010-07-09 00:38:12 +00:00
Benjamin Kramer 0ae3f08c0d Merge the duplicated iabs optimization in DAGCombiner and let it detected a few more idioms.
llvm-svn: 107868
2010-07-08 12:09:56 +00:00
Evan Cheng 1c349f18f8 Move getExtLoad() and (some) getLoad() DebugLoc argument after EVT argument for consistency sake.
llvm-svn: 107820
2010-07-07 22:15:37 +00:00
Devang Patel a3ca21b228 Propagate debug loc.
llvm-svn: 107710
2010-07-06 22:08:15 +00:00
Bob Wilson 269a89fd3a Unlike other targets, ARM now uses BUILD_VECTORs post-legalization so they
can't be changed arbitrarily by the DAGCombiner without checking if it is
running after legalization.

llvm-svn: 107097
2010-06-28 23:40:25 +00:00
Duncan Sands 2dc70bea54 Remove variables which are assigned to but for which the value
is not used.  Spotted by gcc-4.6.

llvm-svn: 106854
2010-06-25 14:48:39 +00:00
Dan Gohman 600f62b3ba Reapply r106634, now that the bug it exposed is fixed.
llvm-svn: 106746
2010-06-24 14:30:44 +00:00
Daniel Dunbar 4df321b7ad Revert r106263, "Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass,"... it was causing both 'file' (with clang) and 176.gcc (with llvm-gcc) to be miscompiled.
llvm-svn: 106634
2010-06-23 17:09:26 +00:00
Jim Grosbach b58c08b0ba Some targets don't require the fencing MEMBARRIER instructions surrounding
atomic intrinsics, either because the use locking instructions for the
atomics, or because they perform the locking directly. Add support in the
DAG combiner to fold away the fences.

llvm-svn: 106630
2010-06-23 16:07:42 +00:00
Dan Gohman b92156d5e4 Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass,
which is faster, simpler, and less surprising.

llvm-svn: 106263
2010-06-18 01:05:21 +00:00
Dale Johannesen 60fe2cdc4f Fix another variant of PR 7191. Also add a testcase
Mon Ping provided; unfortunately bugpoint failed to
reduce it, but I think it's important to have a test for
this in the suite.  8023512.

llvm-svn: 104624
2010-05-25 18:47:23 +00:00
Dale Johannesen ff384ad981 Fix PR 7191. I have been unable to create a .ll file that fails, sorry.
(oye, a word which should be better known to people writing tree
traversals, means grandchild.)

llvm-svn: 104619
2010-05-25 17:50:03 +00:00
Bob Wilson 61438fe064 Clean up extra whitespace.
llvm-svn: 104410
2010-05-21 23:53:55 +00:00
Bob Wilson 51d9ee3ff6 Change CodeGen/ARM/2009-11-02-NegativeLane.ll to use 16-bit vector elements
so that it will continue to test what it was meant to test when I commit a
separate change for better support of BUILD_VECTOR and VECTOR_SHUFFLE for Neon.
Fix a DAG combiner crash exposed by this test change.

llvm-svn: 104380
2010-05-21 21:05:32 +00:00
Bob Wilson 42603958fb Optimize away insertelement of an undef value. This shows up in
test/Codegen/ARM/reg_sequence.ll but it doesn't affect the generated code
because the coalescer cleans it up.  Radar 7998853.

llvm-svn: 104185
2010-05-19 23:42:58 +00:00
Evan Cheng abd0ad54a4 Intrinsics which do a vector compare (results are all zero or all ones) are modeled as icmp / fcmp + sext. This is turned into a vsetcc by dag combine (yes, not a good long term solution). The targets can then isel the vsetcc to the appropriate instruction.
The trouble arises when the result of a vector cmp + sext is then and'ed with all ones. Instcombine will turn it into a vector cmp + zext, dag combiner will miss turning it into a vsetcc and hell breaks loose after that.

Teach dag combine to turn a vector cpm + zest into a vsetcc + and 1. This fixes rdar://7923010.

llvm-svn: 104094
2010-05-19 01:08:17 +00:00
Evan Cheng f19384d54a Sink dag combine's post index load / store code that swap base ptr and index into the target hook. Only the target knows whether the swap is safe. In Thumb2 mode, the offset must be an immediate. rdar://7998649
llvm-svn: 104060
2010-05-18 21:31:17 +00:00
Evan Cheng 48f0de96d6 FIX PR7158. SimplifyVBinOp was asserting when it fails to constant fold (op (build_vector), (build_vector)).
llvm-svn: 104004
2010-05-18 00:03:40 +00:00
Evan Cheng 02947a4551 Be careful with operand promotion. For a binary operation, the source operands may be the same. PR7018. rdar://7939869.
llvm-svn: 103419
2010-05-10 19:03:57 +00:00
Dan Gohman e82c25e878 Apply a patch from Jan Sjodin to fix a compiler abort on vector
comparisons sign-extended to a different bitwidth than the
comparison operands.

llvm-svn: 102721
2010-04-30 17:19:19 +00:00
Evan Cheng f100557c9a Try operation promotion only if regular dag combine and target-specific ones failed to do anything.
llvm-svn: 102492
2010-04-28 07:10:39 +00:00
Evan Cheng e813690b7a - When legal, promote a load to zextload rather than ext load.
- Catch more further dag combine opportunities as result of operand promotion, e.g. (i32 anyext (i16 trunc (i32 x))) -> (i32 x)

llvm-svn: 102455
2010-04-27 19:48:13 +00:00
Evan Cheng 0abb54d631 When a load operand is promoted to an extload, replace other uses with uses of extload result truncated.
llvm-svn: 102236
2010-04-24 04:43:44 +00:00
Dan Gohman 5544b0c588 Apply a fix for a vector setcc dagcombine from Jan Sjodin. No
testcase yet, as the testcase now fails downstream.

llvm-svn: 102228
2010-04-24 01:17:30 +00:00
Evan Cheng b9ff130d47 Code refactoring.
llvm-svn: 102202
2010-04-23 19:10:30 +00:00
Evan Cheng f1223bdec0 - It's not safe to promote rotates (at least not trivially).
- Some code refactoring.

llvm-svn: 102111
2010-04-22 20:19:46 +00:00
Bill Wendling 467e6c2deb The visitXOR method can return the same SDNode. If so, we don't want to delete
it as it's not dead.

llvm-svn: 101855
2010-04-20 01:25:01 +00:00
Evan Cheng e19aa5cc52 More progress on promoting i16 operations to i32 for x86. Work in progress.
llvm-svn: 101808
2010-04-19 19:29:22 +00:00
Evan Cheng f1bd5fcdb4 More work to allow dag combiner to promote 16-bit ops to 32-bit.
llvm-svn: 101621
2010-04-17 06:13:15 +00:00
Evan Cheng f037f87bde (i32 sext_in_reg (i32 aext (i16 x)), i16) -> (i32 sext x). No known test case until -promote-16bit is enabled.
llvm-svn: 101551
2010-04-16 22:26:19 +00:00
Evan Cheng af56facacd Adding support for dag combiner to promote operations for profit. This requires target specific queries. For example, x86 should promote i16 to i32 when it does not impact load folding.
x86 support is off by default. It can be enabled with -promote-16bit.

Work in progress.

llvm-svn: 101448
2010-04-16 06:14:10 +00:00
Chris Lattner 3245afdf05 enhance the load/store narrowing optimization to handle a
tokenfactor in between the load/store.  This allows us to 
optimize test7 into:

_test7:                                 ## @test7
## BB#0:                                ## %entry
	movl	(%rdx), %eax
                                        ## kill: SIL<def> ESI<kill>
	movb	%sil, 5(%rdi)
	ret

instead of:

_test7:                                 ## @test7
## BB#0:                                ## %entry
	movl	4(%esp), %ecx
	movl	$-65281, %eax           ## imm = 0xFFFFFFFFFFFF00FF
	andl	4(%ecx), %eax
	movzbl	8(%esp), %edx
	shll	$8, %edx
	addl	%eax, %edx
	movl	12(%esp), %eax
	movl	(%eax), %eax
	movl	%edx, 4(%ecx)
	ret

llvm-svn: 101355
2010-04-15 06:10:49 +00:00
Chris Lattner 6ebd8674eb teach codegen to turn trunc(zextload) into load when possible.
This doesn't occur much at all, it only seems to formed in the case
when the trunc optimization kicks in due to phase ordering.  In that
case it is saves a few bytes on x86-32.

llvm-svn: 101350
2010-04-15 05:40:59 +00:00
Chris Lattner f9b2e3c68a add a simple dag combine to replace trivial shl+lshr with
and.  This happens with the store->load narrowing stuff.

llvm-svn: 101348
2010-04-15 05:28:43 +00:00
Chris Lattner 4041ab6e00 Implement rdar://7860110 (also in target/readme.txt) narrowing
a load/or/and/store sequence into a narrower store when it is
safe.  Daniel tells me that clang will start producing this sort
of thing with bitfields, and this does  trigger a few dozen times
on 176.gcc produced by llvm-gcc even now.

This compiles code like CodeGen/X86/2009-05-28-DAGCombineCrash.ll 
into:

        movl    %eax, 36(%rdi)

instead of:

        movl    $4294967295, %eax       ## imm = 0xFFFFFFFF
        andq    32(%rdi), %rax
        shlq    $32, %rcx
        addq    %rax, %rcx
        movq    %rcx, 32(%rdi)

and each of the testcases into a single store.  Each of them used
to compile into craziness like this:

_test4:
	movl	$65535, %eax            ## imm = 0xFFFF
	andl	(%rdi), %eax
	shll	$16, %esi
	addl	%eax, %esi
	movl	%esi, (%rdi)
	ret

llvm-svn: 101343
2010-04-15 04:48:01 +00:00
Dan Gohman bcaf681cde Add const qualifiers to CodeGen's use of LLVM IR constructs.
llvm-svn: 101334
2010-04-15 01:51:59 +00:00
Dan Gohman ecd40a34e2 Remove unnecessary parens.
llvm-svn: 101010
2010-04-12 02:24:01 +00:00
Ted Kremenek d87bd77586 Fix -Wsign-compare warning (issued by clang++).
llvm-svn: 100799
2010-04-08 18:49:30 +00:00
Chris Lattner 6855d62768 fix 80 col violation, patch by Alastair Lynn
llvm-svn: 100639
2010-04-07 18:13:33 +00:00
Evan Cheng 43cd9e3845 Fix sdisel memcpy, memset, memmove lowering:
1. Makes it possible to lower with floating point loads and stores.
2. Avoid unaligned loads / stores unless it's fast.
3. Fix some memcpy lowering logic bug related to when to optimize a
   load from constant string into a constant.
4. Adjust x86 memcpy lowering threshold to make it more sane.
5. Fix x86 target hook so it uses vector and floating point memory
   ops more effectively.
rdar://7774704

llvm-svn: 100090
2010-04-01 06:04:33 +00:00
Chris Lattner 4ec0b670d5 fix PR6533 by updating the br(xor) code to remember the case
when it looked past a trunc.

llvm-svn: 98203
2010-03-10 23:46:44 +00:00
Dan Gohman 703b12d62f Fix another bitwidth calculation to handle vector types; based on a
patch by Micah Villmow for PR6572.

llvm-svn: 98188
2010-03-10 21:04:53 +00:00
Dan Gohman e14c4087a3 Fix more code to work properly with vector operands. Based on
a patch my Micah Villmow for PR6465.

llvm-svn: 97692
2010-03-04 00:23:16 +00:00
Bill Wendling c8d3add052 Use APInt instead of zext value.
llvm-svn: 97631
2010-03-03 01:58:01 +00:00
Bill Wendling af13d82945 This test case:
long test(long x) { return (x & 123124) | 3; }

Currently compiles to:

_test:
        orl     $3, %edi
        movq    %rdi, %rax
        andq    $123127, %rax
        ret

This is because instruction and DAG combiners canonicalize

  (or (and x, C), D) -> (and (or, D), (C | D))

However, this is only profitable if (C & D) != 0. It gets in the way of the
3-addressification because the input bits are known to be zero.

llvm-svn: 97616
2010-03-03 00:35:56 +00:00
Dan Gohman 4cec543952 Fix several places to handle vector operands properly.
Based on a patch by Micah Villmow for PR6438.

llvm-svn: 97538
2010-03-02 02:14:38 +00:00
Evan Cheng 228c31f045 Re-apply 97040 with fix. This survives a ppc self-host llvm-gcc bootstrap.
llvm-svn: 97310
2010-02-27 07:36:59 +00:00
Daniel Dunbar 4811d004be Speculatively revert r97011, "Re-apply 96540 and 96556 with fixes.", again in
the hopes of fixing PPC bootstrap.

llvm-svn: 97040
2010-02-24 17:05:47 +00:00
Evan Cheng 328a607490 Re-apply 96540 and 96556 with fixes.
llvm-svn: 97011
2010-02-24 01:42:31 +00:00
Duncan Sands d0bf6f640f Revert commits 96556 and 96640, because commit 96556 breaks the
dragonegg self-host build.  I reverted 96640 in order to revert
96556 (96640 goes on top of 96556), but it also looks like with
both of them applied the breakage happens even earlier.  The
symptom of the 96556 miscompile is the following crash:

  llvm[3]: Compiling AlphaISelLowering.cpp for Release build
  cc1plus: /home/duncan/tmp/tmp/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4982: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*, llvm::SelectionDAG::DAGUpdateListener*): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed.
  Stack dump:
  0.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@_ZN4llvm19AlphaTargetLowering14LowerOperationENS_7SDValueERNS_12SelectionDAGE'
  g++: Internal error: Aborted (program cc1plus)

This occurs when building LLVM using LLVM built by LLVM (via
dragonegg).  Probably LLVM has miscompiled itself, though it
may have miscompiled GCC and/or dragonegg itself: at this point
of the self-host build, all of GCC, LLVM and dragonegg were built
using LLVM.  Unfortunately this kind of thing is extremely hard
to debug, and while I did rummage around a bit I didn't find any
smoking guns, aka obviously miscompiled code.

Found by bisection.

r96556 | evancheng | 2010-02-18 03:13:50 +0100 (Thu, 18 Feb 2010) | 5 lines

Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"

r96640 | evancheng | 2010-02-19 01:34:39 +0100 (Fri, 19 Feb 2010) | 16 lines

Transform (xor (setcc), (setcc)) == / != 1 to
(xor (setcc), (setcc)) != / == 1.

e.g. On x86_64
  %0 = icmp eq i32 %x, 0
  %1 = icmp eq i32 %y, 0
  %2 = xor i1 %1, %0
  br i1 %2, label %bb, label %return
=>
	testl   %edi, %edi
	sete    %al
	testl   %esi, %esi
	sete    %cl
	cmpb    %al, %cl
	je      LBB1_2

llvm-svn: 96672
2010-02-19 11:30:41 +00:00
Evan Cheng 0ceb68a552 Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"

llvm-svn: 96556
2010-02-18 02:13:50 +00:00
David Greene 39c6d01879 Add non-temporal flags and remove an assumption of default arguments.
llvm-svn: 96240
2010-02-15 17:00:31 +00:00
Dan Gohman 4a618827de Fix "the the" and similar typos.
llvm-svn: 95781
2010-02-10 16:03:48 +00:00
Mon P Wang d74e0023c5 Improve EXTRACT_VECTOR_ELT patch based on comments from Duncan
llvm-svn: 95012
2010-02-01 22:15:09 +00:00
Mon P Wang 72c60c73af Fixed a couple of optimization with EXTRACT_VECTOR_ELT that assumes the result
type is the same as the element type of the vector.  EXTRACT_VECTOR_ELT can
be used to extended the width of an integer type.  This fixes a bug for
Generic/vector-casts.ll on a ppc750.

llvm-svn: 94990
2010-02-01 19:03:18 +00:00
Evan Cheng 555f61bf58 Implement cond ? -1 : 0 with sbb.
llvm-svn: 94490
2010-01-26 02:00:44 +00:00
Dan Gohman 954f49014d Fold (add x, shl(0 - y, n)) -> sub(x, shl(y, n)), to simplify some code
that SCEVExpander can produce when running on behalf of LSR.

llvm-svn: 93949
2010-01-19 23:30:49 +00:00
Evan Cheng 88b65bc835 Canonicalize -1 - x to ~x.
Instcombine does this but apparently there are situations where this pattern will escape the optimizer and / or created by isel. Here is a case that's seen in JavaScriptCore:
  %t1 = sub i32 0, %a
  %t2 = add i32 %t1, -1
The dag combiner pattern: ((c1-A)+c2) -> (c1+c2)-A
will fold it to -1 - %a.

llvm-svn: 93773
2010-01-18 21:38:44 +00:00
Dan Gohman dd5286dc63 Fix a codegen abort seen in 483.xalancbmk.
llvm-svn: 93417
2010-01-14 03:08:49 +00:00
Mon P Wang ec57c81e64 Disable transformation of select of two loads to a select of address and then a load if the
loads are not in the default address space because the transformation discards src value info.

llvm-svn: 93180
2010-01-11 20:12:49 +00:00
Dan Gohman 6bd3ef82ff Revert an earlier change to SIGN_EXTEND_INREG for vectors. The VTSDNode
really does need to be a vector type, because
TargetLowering::getOperationAction for SIGN_EXTEND_INREG uses that type,
and it needs to be able to distinguish between vectors and scalars.

Also, fix some more issues with legalization of vector casts.

llvm-svn: 93043
2010-01-09 02:13:55 +00:00
Chris Lattner dab2cd543f Fix rdar://7517201, a regression introduced by r92849.
When folding a and(any_ext(load)) both the any_ext and the
load have to have only a single use.

This removes the anyext-uses.ll testcase which started failing
because it is unreduced and unclear what it is testing.

llvm-svn: 92950
2010-01-07 21:59:23 +00:00
Chris Lattner 88de38453f factor this code better and reduce nesting at the same
time, no functionality change.

llvm-svn: 92948
2010-01-07 21:53:27 +00:00
Evan Cheng 166a4e6caa Teach dag combine to fold the following transformation more aggressively:
(OP (trunc x), (trunc y)) -> (trunc (OP x, y))

Unfortunately this simple change causes dag combine to infinite looping. The problem is the shrink demanded ops optimization tend to canonicalize expressions in the opposite manner. That is badness. This patch disable those optimizations in dag combine but instead it is done as a late pass in sdisel.

This also exposes some deficiencies in dag combine and x86 setcc / brcond lowering. Teach them to look pass ISD::TRUNCATE in various places.

llvm-svn: 92849
2010-01-06 19:38:29 +00:00
Bill Wendling 03f0af372c Don't assign the shift the same type as the variable being shifted. This could
result in illegal types for the SHL operator.

llvm-svn: 92797
2010-01-05 22:39:10 +00:00
David Greene fe5c3524c7 Change errs() to dbgs().
llvm-svn: 92578
2010-01-05 01:25:00 +00:00
Evan Cheng b175de6356 Increase opportunities to optimize (brcond (srl (and c1), c2)).
llvm-svn: 91717
2009-12-18 21:31:31 +00:00
Evan Cheng aadf060b92 Revert this dag combine change:
Fold (zext (and x, cst)) -> (and (zext x), cst)

DAG combiner likes to optimize expression in the other way so this would end up cause an infinite looping.

llvm-svn: 91574
2009-12-17 00:40:05 +00:00
Evan Cheng 852c486946 Make 91378 more conservative.
1. Only perform (zext (shl (zext x), y)) -> (shl (zext x), y) when y is a constant. This makes sure it remove at least one zest.
2. If the shift is a left shift, make sure the original shift cannot shift out bits.

llvm-svn: 91399
2009-12-15 03:00:32 +00:00
Evan Cheng d1521ef40c Fold (zext (and x, cst)) -> (and (zext x), cst).
llvm-svn: 91380
2009-12-15 00:52:11 +00:00
Evan Cheng ca7c690d3b Propagate zest through logical shift.
llvm-svn: 91378
2009-12-15 00:41:36 +00:00
Dan Gohman cecad35728 Fix integer cast code to handle vector types.
llvm-svn: 91362
2009-12-14 23:40:38 +00:00
Dan Gohman 1d459e4937 Implement vector widening, splitting, and scalarizing for SIGN_EXTEND_INREG.
llvm-svn: 91158
2009-12-11 21:31:27 +00:00
Evan Cheng f5938d5d27 Move isConsecutiveLoad to SelectionDAG. It's not target dependent and it's primary used by selectdag passes.
llvm-svn: 90922
2009-12-09 01:36:00 +00:00
Evan Cheng 34a23ea371 Refactor InferAlignment out of DAGCombine.
llvm-svn: 90917
2009-12-09 01:04:59 +00:00
Nate Begeman 9655f84662 Don't pull vector sext through both hands of a logical operation, since doing so prevents the fusion of vector sext and setcc into vsetcc.
Add a testcase for the above transformation.
Fix a bogus use of APInt noticed while tracking this down.

llvm-svn: 90423
2009-12-03 07:11:29 +00:00
Jakob Stoklund Olesen 32042f9475 Don't call getValueType() on a null SDValue
llvm-svn: 90415
2009-12-03 05:15:35 +00:00
Dan Gohman 82e80019a5 Remove the optimizations that convert BRCOND and BR_CC into
unconditional branches or fallthroghes. Instcombine/SimplifyCFG
should be simplifying branches with known conditions.

This fixes some problems caused by these transformations not
updating the MachineBasicBlock CFG.

llvm-svn: 89017
2009-11-17 00:47:23 +00:00
Dan Gohman a951526510 Remove an unneeded #include.
llvm-svn: 86601
2009-11-09 22:28:30 +00:00
Dan Gohman ba8735d25a When discarding SrcValue information, discard all of it so that code
that uses this information knows to behave conservatively.

llvm-svn: 85654
2009-10-31 14:14:04 +00:00
Dan Gohman 14ca753e28 Don't call SDNode::isPredecessorOf when it isn't necessary. If the load's
chains have no users, they can't be predecessors of the condition.

llvm-svn: 85394
2009-10-28 15:28:02 +00:00
Nick Lewycky 974e12b2d3 Remove includes of Support/Compiler.h that are no longer needed after the
VISIBILITY_HIDDEN removal.

llvm-svn: 85043
2009-10-25 06:57:41 +00:00
Nick Lewycky 02d5f77d26 Remove VISIBILITY_HIDDEN from class/struct found inside anonymous namespaces.
Chris claims we should never have visibility_hidden inside any .cpp file but
that's still not true even after this commit.

llvm-svn: 85042
2009-10-25 06:33:48 +00:00
Anton Korobeynikov a6faf60831 Fix invalid for vector types fneg(bitconvert(x)) => bitconvert(x ^ sign)
transform.

llvm-svn: 84683
2009-10-20 21:37:45 +00:00
Nate Begeman a3ed9edd40 More heuristics for Combiner-AA. Still catches all important cases, but
compile time penalty on gnugo, the worst case in MultiSource, is down to
about 2.5% from 30%

llvm-svn: 83824
2009-10-12 05:53:58 +00:00
Nate Begeman 18150d5abc Fix combiner-aa issue with bases which are different, but can alias.
Previously, it treated GV+28 GV+0 as different bases, and assumed they could
not alias.

llvm-svn: 82753
2009-09-25 06:05:26 +00:00
Dan Gohman 203d53ed79 Use getStoreSize() instead of getStoreSizeInBits()/8.
llvm-svn: 82656
2009-09-23 21:07:02 +00:00
Dan Gohman 08c0a95ac6 Rename several variables from EVT to more descriptive names, now that EVT
is also the name of their type, as declarations like "EVT EVT" look
really odd.

llvm-svn: 82654
2009-09-23 21:02:20 +00:00
Nate Begeman 879d8f1c3e Substantially speed up combiner-aa in the following ways:
1. Switch from an std::set to a SmallPtrSet for visited chain nodes.
2. Do not force the recursive flattening of token factor nodes, regardless of
   use count.
3. Immediately process newly created TokenFactor nodes.

Also, improve combiner-aa by teaching it that loads to non-overlapping offsets
of relatively aligned objects cannot alias.

These changes result in a >5x speedup for combiner-aa on most testcases.

llvm-svn: 81816
2009-09-15 00:18:30 +00:00
Bob Wilson 39f51320ca Don't swap the operands of a subtraction when trying to create a
post-decrement load/store.

llvm-svn: 81464
2009-09-10 22:09:31 +00:00
Duncan Sands 2fbeaf084f Remove some unused variables and methods warned about by
icc (#177, partial).  Patch by Erick Tryzelaar.

llvm-svn: 81106
2009-09-06 08:33:48 +00:00
Chris Lattner 4dc3edde9f remove a few DOUTs here and there.
llvm-svn: 79832
2009-08-23 06:35:02 +00:00
Eli Friedman 79ba8f2edc Add check for completeness. Note that this doesn't actually have any
effect with the way the current code is structured.

llvm-svn: 79792
2009-08-23 00:14:19 +00:00
Eli Friedman 1e008c173a PR4737: Fix a nasty bug in load narrowing with non-power-of-two types.
llvm-svn: 79415
2009-08-19 08:46:10 +00:00
Owen Anderson 117c9e8497 Add contexts to some of the MVT APIs. No functionality change yet, just the infrastructure work needed to get the contexts to where they need to be first.
llvm-svn: 78759
2009-08-12 00:36:31 +00:00
Owen Anderson 9f94459d24 Split EVT into MVT and EVT, the former representing _just_ a primitive type, while
the latter is capable of representing either a primitive or an extended type.

llvm-svn: 78713
2009-08-11 20:47:22 +00:00
Dan Gohman 9d26c85bdc Fix a bug in the DAGCombiner's handling of multiple linked
MERGE_VALUES nodes. Replacing the result values with the
operands in one MERGE_VALUES node may cause another
MERGE_VALUES node be CSE'd with the first one, and bring
its uses along, so that the first one isn't dead, as this
code expects. Fix this by iterating until the node is
really dead. This fixes PR4699.

llvm-svn: 78619
2009-08-10 23:43:19 +00:00
Dan Gohman 733a64db57 Fix a bug where DAGCombine was producing an illegal ConstantFP
node after legalize, and remove the workaround code from the
ARM backend.

llvm-svn: 78615
2009-08-10 23:15:10 +00:00
Owen Anderson 53aa7a960c Rename MVT to EVT, in preparation for splitting SimpleValueType out into its own struct type.
llvm-svn: 78610
2009-08-10 22:56:29 +00:00
Dan Gohman b717091e69 Make this comment more closely reflect the code.
llvm-svn: 78569
2009-08-10 16:50:32 +00:00
Jakob Stoklund Olesen dc6bccbaa6 Don't build illegal ops in DAGCombiner::SimplifyBinOpWithSameOpcodeHands().
Blackfin supports and/or/xor on i32 but not on i16. Teach
DAGCombiner::SimplifyBinOpWithSameOpcodeHands to not produce illegal nodes
after legalize ops.

llvm-svn: 78497
2009-08-08 20:42:17 +00:00
Dan Gohman 5758e1e92a Fix a few places in DAGCombiner that were creating all-ones-bits
and high-bits values in ways that weren't correct for integer
types wider than 64 bits. This fixes a miscompile in
PPMacroExpansion.cpp in clang on x86-64.

llvm-svn: 78295
2009-08-06 09:18:59 +00:00
Dan Gohman 3f323847bc Avoid forming a SELECT_CC in a type that the target doesn't
support. This isn't immediately interesting, because Legalize
ends up lowering SELECT_CC if the target doesn't support it,
but this simplifies the process.

Also, if the SELECT_CC would be expanded in Legalize, it
can potentially end up with two copies of the condition
expression. By leaving it as SELECT+SETCC, the SELECT can be
expanded into two SELECTs that use a single SETCC.

The two comparisons are usually CSE'd, but depending on
when various expressions get legalized, the comparison
expression could involve calls to library functions, such
that the comparison expression may not be able to be CSE'd.
This will be needed by a future patch.

llvm-svn: 77896
2009-08-02 16:19:38 +00:00
Owen Anderson 4056ca9568 Move types back to the 2.5 API.
llvm-svn: 77516
2009-07-29 22:17:13 +00:00
Owen Anderson c2c7932c64 Change ConstantArray to 2.5 API.
llvm-svn: 77347
2009-07-28 18:32:17 +00:00
Jakob Stoklund Olesen 1ae0736830 Add support for promoting SETCC operations.
llvm-svn: 76987
2009-07-24 18:22:59 +00:00
Evan Cheng a7bb55ebb6 Fix a dagga combiner bug: avoid creating illegal constant.
Is this really a winning transformation?
fold (shl (srl x, c1), c2) -> (shl (and x, (shl -1, c1)), (sub c2, c1)) or                                                                              
                              (srl (and x, (shl -1, c1)), (sub c1, c2))

llvm-svn: 76535
2009-07-21 05:40:15 +00:00
Owen Anderson f945a9ed07 Move a few more convenience factory functions from Constant to LLVMContext.
llvm-svn: 75840
2009-07-15 21:51:10 +00:00
Torok Edwin fbcc663cbf llvm_unreachable->llvm_unreachable(0), LLVM_UNREACHABLE->llvm_unreachable.
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").

llvm-svn: 75640
2009-07-14 16:55:14 +00:00
Torok Edwin 56d0659726 assert(0) -> LLVM_UNREACHABLE.
Make llvm_unreachable take an optional string, thus moving the cerr<< out of
line.
LLVM_UNREACHABLE is now a simple wrapper that makes the message go away for
NDEBUG builds.

llvm-svn: 75379
2009-07-11 20:10:48 +00:00
Torok Edwin ccb29cd290 Convert more assert(0)+abort() -> LLVM_UNREACHABLE,
and abort()/exit() -> llvm_report_error().

llvm-svn: 75363
2009-07-11 13:10:19 +00:00
Owen Anderson 0504e0a222 Thread LLVMContext through MVT and related parts of SDISel.
llvm-svn: 75153
2009-07-09 17:57:24 +00:00
Chris Lattner 4ac607332d dag combine sext(setcc) -> vsetcc before legalize. To make this safe,
VSETCC must define all bits, which is different than it was documented
to before.  Since all targets that implement VSETCC already have this
behavior, and we don't optimize based on this, just change the 
documentation.  We now get nice code for vec_compare.ll

llvm-svn: 74978
2009-07-08 00:31:33 +00:00
Nate Begeman 624690c6b2 Adapt the x86 build_vector dagcombine to the current state of the legalizer.
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector 
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.

Teach the build_vector dag combine in x86 back end to recognize consecutive 
loads producing the low part of the vector.

Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.

Add a testcase for the transform.

Old:
	subl	$28, %esp
	movl	32(%esp), %eax
	movl	4(%eax), %ecx
	movl	%ecx, 4(%esp)
	movl	(%eax), %eax
	movl	%eax, (%esp)
	movaps	(%esp), %xmm0
	pmovzxwd	%xmm0, %xmm0
	movl	36(%esp), %eax
	movaps	%xmm0, (%eax)
	addl	$28, %esp
	ret

New:
	movl	4(%esp), %eax
	pmovzxwd	(%eax), %xmm0
	movl	8(%esp), %eax
	movaps	%xmm0, (%eax)
	ret

llvm-svn: 72957
2009-06-05 21:37:30 +00:00
Dan Gohman 7b6b5dd954 Don't do the X * 0.0 -> 0.0 transformation in instcombine, because
instcombine doesn't know when it's safe. To partially compensate
for this, introduce new code to do this transformation in
dagcombine, which can use UnsafeFPMath.

llvm-svn: 72872
2009-06-04 17:12:12 +00:00
Dale Johannesen 5234d3795f Revert 72707 and 72709, for the moment.
llvm-svn: 72712
2009-06-02 03:12:52 +00:00
Dale Johannesen 0b8ca79253 Make the implicit inputs and outputs of target-independent
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag.  Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.

Most targets will still produce a Flag-setting target-dependent
version when selection is done.  X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc.  This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted.  All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly.  The
same can be done on other targets.

The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.

llvm-svn: 72707
2009-06-01 23:27:20 +00:00
Evan Cheng 86cdb4b345 Do not try to create a MVT type of width 0.
llvm-svn: 72557
2009-05-28 23:52:18 +00:00
Evan Cheng 6673ff08fe Incorporate patch feedbacks.
llvm-svn: 72533
2009-05-28 18:41:02 +00:00
Evan Cheng a9cda8abf2 Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
e.g.
orl     $65536, 8(%rax)
=>
orb     $1, 10(%rax)

Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.

llvm-svn: 72507
2009-05-28 00:35:15 +00:00
Torok Edwin be6a9a151a Fix PR4254.
The DAGCombiner created a negative shiftamount, stored in an
unsigned variable. Later the optimizer eliminated the shift entirely as being
undefined.
Example: (srl (shl X, 56) 48). ShiftAmt is 4294967288.
Fix it by checking that the shiftamount is positive, and storing in a signed
variable.

llvm-svn: 72331
2009-05-23 17:29:48 +00:00
Daniel Dunbar a8c1658619 Silence Release-Asserts warnings.
llvm-svn: 72011
2009-05-18 16:43:04 +00:00
Duncan Sands af9eaa830a Rename PaddedSize to AllocSize, in the hope that this
will make it more obvious what it represents, and stop
it being confused with the StoreSize.

llvm-svn: 71349
2009-05-09 07:06:46 +00:00
Evan Cheng cfc0513080 Do not use register as base ptr of pre- and post- inc/dec load / store nodes.
llvm-svn: 71098
2009-05-06 18:25:01 +00:00
Bill Wendling 026e5d7667 Instead of passing in an unsigned value for the optimization level, use an enum,
which better identifies what the optimization is doing. And is more flexible for
future uses.

llvm-svn: 70440
2009-04-29 23:29:43 +00:00
Nate Begeman 5f829d896d Implement review feedback for vector shuffle work.
llvm-svn: 70372
2009-04-29 05:20:52 +00:00
Bill Wendling 084669a1c9 Second attempt:
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.

Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.

llvm-svn: 70343
2009-04-29 00:15:41 +00:00
Bill Wendling 56f2987a87 r70270 isn't ready yet. Back this out. Sorry for the noise.
llvm-svn: 70275
2009-04-28 01:04:53 +00:00
Bill Wendling d0ae15946c Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.

Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...

llvm-svn: 70270
2009-04-28 00:21:31 +00:00
Nate Begeman 8d6d4b9289 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225
2009-04-27 18:41:29 +00:00
Dan Gohman be36f5ccda When transforming sext(trunc(load(x))) into sext(smaller load(x)),
the trunc is directly replaced with the smaller load, so don't
try to create a new sext node. This fixes PR4050.

llvm-svn: 70179
2009-04-27 02:00:55 +00:00
Dan Gohman 4539987920 Add a top-level comment about DAGCombiner's role in the compiler.
llvm-svn: 70052
2009-04-25 17:09:45 +00:00
Rafael Espindola b93db668b3 Revert 69952. Causes testsuite failures on linux x86-64.
llvm-svn: 69967
2009-04-24 12:40:33 +00:00
Nate Begeman bb881d66f4 PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952
2009-04-24 03:42:54 +00:00
Bob Wilson da188ebbbd Revise my previous change 68996 as suggested by Duncan.
llvm-svn: 69607
2009-04-20 17:27:09 +00:00
Duncan Sands e4ff21ba4b Don't try to make BUILD_VECTOR operands have the same
type as the vector element type: allow them to be of
a wider integer type than the element type all the way
through the system, and not just as far as LegalizeDAG.
This should be safe because it used to be this way
(the old type legalizer would produce such nodes), so
backends should be able to handle it.  In fact only
targets which have legal vector types with an illegal
promoted element type will ever see this (eg: <4 x i16>
on ppc).  This fixes a regression with the new type
legalizer (vec_splat.ll).  Also, treat SCALAR_TO_VECTOR
the same as BUILD_VECTOR.  After all, it is just a
special case of BUILD_VECTOR.

llvm-svn: 69467
2009-04-18 20:16:54 +00:00
Bob Wilson 59dbbb2bb4 Change SelectionDAG type legalization to allow BUILD_VECTOR operands to be
promoted to legal types without changing the type of the vector.  This is
following a suggestion from Duncan
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/019923.html).
The transformation that used to be done during type legalization is now
postponed to DAG legalization.  This allows the BUILD_VECTORs to be optimized
and potentially handled specially by target-specific code.

It turns out that this is also consistent with an optimization done by the
DAG combiner: a BUILD_VECTOR and INSERT_VECTOR_ELT may be combined by
replacing one of the BUILD_VECTOR operands with the newly inserted element;
but INSERT_VECTOR_ELT allows its scalar operand to be larger than the
element type, with any extra high bits being implicitly truncated.  The
result is a BUILD_VECTOR where one of the operands has a type larger the
the vector element type.

Any code that operates on BUILD_VECTORs may now need to be aware of the
potential type discrepancy between the vector element type and the
BUILD_VECTOR operands.  This patch updates all of the places that I could
find to handle that case.

llvm-svn: 68996
2009-04-13 22:05:19 +00:00
Dan Gohman 0e8d199f91 Generalize ExtendUsesToFormExtLoad to be usable for ANY_EXTEND,
in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.

In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.

llvm-svn: 68670
2009-04-09 03:51:29 +00:00
Dan Gohman ad3e549a53 Implement support for using modeling implicit-zero-extension on x86-64
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576
2009-04-08 00:15:30 +00:00
Evan Cheng fd81c73cde Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g.
x * 40
=>
shlq    $3, %rdi
leaq    (%rdi,%rdi,4), %rax

This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq    (%rdi,%rdi,2), %rax
leaq    (%rsi,%rax,8), %rax

llvm-svn: 67917
2009-03-28 05:57:29 +00:00
Bill Wendling aa28be652c Pull transform from target-dependent code into target-independent code.
llvm-svn: 67742
2009-03-26 06:14:09 +00:00
Mon P Wang 523c0852c6 Fix a problem with DAGCombine where we were building an illegal build
vector shuffle mask. Forced the mask to be built using i32.  Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.

llvm-svn: 67076
2009-03-17 06:33:10 +00:00
Mon P Wang c86715631c Avoid doing the transformation c ? 1.0 : 2.0 as load { 2.0, 1.0 } + c*4
if FPConstant is legal because if the FPConstant doesn't need to be stored
in a constant pool, the transformation is unlikely to be profitable.

llvm-svn: 66994
2009-03-14 00:25:19 +00:00
Evan Cheng 1fb8aedd1e Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.


Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875
2009-03-13 07:51:59 +00:00
Chris Lattner 4147f08e44 Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))"
related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779
2009-03-12 06:52:53 +00:00
Chris Lattner 43d6377f89 reapply my previous patch (r66358) with a tweak to set the
alignment of the generated constant pool entry to the
desired alignment of a type.  If we don't do this, we end up
trying to do movsd from 4-byte alignment memory.  This fixes
450.soplex and 456.hmmer.

llvm-svn: 66641
2009-03-11 05:08:08 +00:00
Evan Cheng aa887653f4 Revert 66358 for now. It's breaking povray, 450.soplex, and 456.hmmer on x86 / Darwin.
llvm-svn: 66574
2009-03-10 20:47:18 +00:00
Chris Lattner 4249b9a698 Fix PR3763 by using proper APInt methods instead of uint64_t's.
llvm-svn: 66434
2009-03-09 20:22:18 +00:00
Chris Lattner ab5a443144 implement an optimization to codegen c ? 1.0 : 2.0 as load { 2.0, 1.0 } + c*4.
For 2009-03-07-FPConstSelect.ll we now produce:

_f:
	xorl	%eax, %eax
	testl	%edi, %edi
	movl	$4, %ecx
	cmovne	%rax, %rcx
	leaq	LCPI1_0(%rip), %rax
	movss	(%rcx,%rax), %xmm0
	ret

previously we produced:

_f:
	subl	$4, %esp
	cmpl	$0, 8(%esp)
	movss	LCPI1_0, %xmm0
	je	LBB1_2	## entry
LBB1_1:	## entry
	movss	LCPI1_1, %xmm0
LBB1_2:	## entry
	movss	%xmm0, (%esp)
	flds	(%esp)
	addl	$4, %esp
	ret

on PPC the code also improves to:

_f:
	cntlzw r2, r3
	srwi r2, r2, 5
	li r3, lo16(LCPI1_0)
	slwi r2, r2, 2
	addis r3, r3, ha16(LCPI1_0)
	lfsx f1, r3, r2
	blr 

from:

_f:
	li r2, lo16(LCPI1_1)
	cmplwi cr0, r3, 0
	addis r2, r2, ha16(LCPI1_1)
	beq cr0, LBB1_2	; entry
LBB1_1:	; entry
	li r2, lo16(LCPI1_0)
	addis r2, r2, ha16(LCPI1_0)
LBB1_2:	; entry
	lfs f1, 0(r2)
	blr 

This also improves the existing pic-cpool case from:

foo:
	subl	$12, %esp
	call	.Lllvm$1.$piclabel
.Lllvm$1.$piclabel:
	popl	%eax
	addl	$_GLOBAL_OFFSET_TABLE_ + [.-.Lllvm$1.$piclabel], %eax
	cmpl	$0, 16(%esp)
	movsd	.LCPI1_0@GOTOFF(%eax), %xmm0
	je	.LBB1_2	# entry
.LBB1_1:	# entry
	movsd	.LCPI1_1@GOTOFF(%eax), %xmm0
.LBB1_2:	# entry
	movsd	%xmm0, (%esp)
	fldl	(%esp)
	addl	$12, %esp
	ret

to:

foo:
	call	.Lllvm$1.$piclabel
.Lllvm$1.$piclabel:
	popl	%eax
	addl	$_GLOBAL_OFFSET_TABLE_ + [.-.Lllvm$1.$piclabel], %eax
	xorl	%ecx, %ecx
	cmpl	$0, 4(%esp)
	movl	$8, %edx
	cmovne	%ecx, %edx
	fldl	.LCPI1_0@GOTOFF(%eax,%edx)
	ret

This triggers a few dozen times in spec FP 2000.

llvm-svn: 66358
2009-03-08 01:51:30 +00:00
Nate Begeman a9e981225e Fix a problem with DAGCombine on 64b targets where folding
extracts + build_vector into a shuffle would fail, because the
type of the new build_vector would not be legal.  Try harder to
create a legal build_vector type.  Note: this will be totally 
irrelevant once vector_shuffle no longer takes a build_vector for
shuffle mask.

New:
_foo:
	xorps	%xmm0, %xmm0
	xorps	%xmm1, %xmm1
	subps	%xmm1, %xmm1
	mulps	%xmm0, %xmm1
	addps	%xmm0, %xmm1
	movaps	%xmm1, 0

Old:
_foo:
	xorps	%xmm0, %xmm0
	movss	%xmm0, %xmm1
	xorps	%xmm2, %xmm2
	unpcklps	%xmm1, %xmm2
	pshufd	$80, %xmm1, %xmm1
	unpcklps	%xmm1, %xmm2
	pslldq	$16, %xmm2
	pshufd	$57, %xmm2, %xmm1
	subps	%xmm0, %xmm1
	mulps	%xmm0, %xmm1
	addps	%xmm0, %xmm1
	movaps	%xmm1, 0

llvm-svn: 65791
2009-03-01 23:44:07 +00:00
Evan Cheng a49de9de2e Revert BuildVectorSDNode related patches: 65426, 65427, and 65296.
llvm-svn: 65482
2009-02-25 22:49:59 +00:00
Scott Michel 9d31aca679 Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296
2009-02-22 23:36:09 +00:00
Dan Gohman e7fe80fcf9 Fix a bug that David Greene found in the DAGCombiner's logic
that checks whether it's safe to transform a store of a bitcast
value into a store of the original value.

llvm-svn: 65201
2009-02-20 23:29:13 +00:00
Scott Michel cf0da6c597 Remove trailing whitespace to reduce later commit patch noise.
(Note: Eventually, commits like this will be handled via a pre-commit hook that
 does this automagically, as well as expand tabs to spaces and look for 80-col
 violations.)

llvm-svn: 64827
2009-02-17 22:15:04 +00:00
Dale Johannesen 84935759d5 Remove more non-DebugLoc getNode variants. Use
getCALLSEQ_{END,START} to permit passing no DebugLoc
there.  UNDEF doesn't logically have DebugLoc; add
getUNDEF to encapsulate this.

llvm-svn: 63978
2009-02-06 23:05:02 +00:00
Dale Johannesen 400dc2e2e4 Remove more non-DebugLoc versions of getNode.
llvm-svn: 63969
2009-02-06 21:50:26 +00:00
Dale Johannesen f1163e9a4d Propagation in TargetLowering. Includes passing a DL
into SimplifySetCC which gets called elsewhere.

llvm-svn: 63583
2009-02-03 00:47:48 +00:00
Duncan Sands 3ed768868d Fix PR3453 and probably a bunch of other potential
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.

llvm-svn: 63494
2009-02-01 18:06:53 +00:00
Bill Wendling a6c75ffd73 Forgot some more DebugLoc propagations.
llvm-svn: 63493
2009-02-01 11:19:36 +00:00
Duncan Sands 41826036b1 Fix PR3401: when using large integers, the type
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization.  Thanks to Dan for writing the
original patch (which I shamelessly pillaged).

llvm-svn: 63482
2009-01-31 15:50:11 +00:00
Bill Wendling 3b585af0ec Don't use DebugLoc::getUnknownLoc(). Default to something hopefully sensible.
llvm-svn: 63473
2009-01-31 03:12:48 +00:00
Bill Wendling 31b50991cb More DebugLoc propagation.
llvm-svn: 63454
2009-01-30 23:59:18 +00:00
Bill Wendling 27d9dd4b57 More DebugLoc propagation.
llvm-svn: 63452
2009-01-30 23:36:47 +00:00
Bill Wendling 306bfc2213 More DebugLoc propagation in LOAD etc. methods.
llvm-svn: 63451
2009-01-30 23:27:35 +00:00
Bill Wendling 0bd29743e3 More DebugLoc propagation in floating-point methods.
llvm-svn: 63446
2009-01-30 23:15:49 +00:00
Bill Wendling 6fbf5495f8 Standardize comments about folding xforms.
llvm-svn: 63443
2009-01-30 23:10:18 +00:00
Bill Wendling 8fb81f1b3d Get rid of the non-DebugLoc-ified getNOT() method.
llvm-svn: 63442
2009-01-30 23:03:19 +00:00
Bill Wendling 3dc5d2454e Propagate debug loc info for some FP arithmetic methods.
llvm-svn: 63441
2009-01-30 22:57:07 +00:00
Bill Wendling cb9be5d174 Propagate debug loc info for some FP arithmetic methods.
llvm-svn: 63440
2009-01-30 22:53:48 +00:00
Bill Wendling 4e0a61514b Propagate debug loc info for BIT_CONVERT.
llvm-svn: 63439
2009-01-30 22:44:24 +00:00
Bill Wendling 7bfa43b022 Propagate debug loc info for more *_EXTEND methods.
llvm-svn: 63437
2009-01-30 22:33:24 +00:00
Bill Wendling 9b3dc8d848 Propagate debug loc info for ANY_EXTEND.
llvm-svn: 63436
2009-01-30 22:27:33 +00:00
Bill Wendling c409318562 Propagate debug loc info for some of the *_EXTEND functions.
llvm-svn: 63434
2009-01-30 22:23:15 +00:00
Bill Wendling b6b6f46fe4 - Propagate debug loc info for SELECT.
- Added xform for (select X, 1, Y) and (select X, Y, 0), which was commented on,
  but missing.

llvm-svn: 63428
2009-01-30 22:02:18 +00:00
Bill Wendling d51e3ff540 Propagate debug loc info for Shifts.
llvm-svn: 63424
2009-01-30 21:37:17 +00:00
Bill Wendling 35972a9460 Propagate debug loc info for XOR and MatchRotate.
llvm-svn: 63420
2009-01-30 21:14:50 +00:00
Bill Wendling f29b6e1318 Propagate debug loc info for OR. Also clean up some comments.
llvm-svn: 63419
2009-01-30 20:59:34 +00:00
Bill Wendling ff8acd684f Perform obvious constant arithmetic folding.
llvm-svn: 63417
2009-01-30 20:50:00 +00:00
Bill Wendling 8617191302 Propagate debug loc info for AND. Also clean up some comments.
llvm-svn: 63416
2009-01-30 20:43:18 +00:00
Bill Wendling 781db7a1ad Propagate debug loc info in SimplifyBinOpWithSameOpcodeHands.
llvm-svn: 63411
2009-01-30 19:25:47 +00:00
Bill Wendling 9b3407e5bb Propagate debug loc info in SimplifyNodeWithTwoResults.
llvm-svn: 63376
2009-01-30 03:08:40 +00:00
Bill Wendling faed065e5c Propagate debug loc info for MULHS.
llvm-svn: 63375
2009-01-30 03:00:18 +00:00
Bill Wendling d033af09fd Propagate debug loc info for SREM and UREM.
llvm-svn: 63374
2009-01-30 02:57:00 +00:00
Bill Wendling aff3e03765 Propagate debug loc info for UDIV.
llvm-svn: 63373
2009-01-30 02:55:25 +00:00
Bill Wendling 5b663e7b53 Propagate debug loc info for SDIV.
llvm-svn: 63372
2009-01-30 02:52:17 +00:00
Bill Wendling b48dcf67e5 Forgot to propagate debug loc info here.
llvm-svn: 63371
2009-01-30 02:49:26 +00:00
Bill Wendling 091f92f568 Propagate debug loc info for MUL.
llvm-svn: 63369
2009-01-30 02:45:56 +00:00
Bill Wendling 48ff08ef3e Propagate debug loc info in SUB.
llvm-svn: 63368
2009-01-30 02:42:10 +00:00
Bill Wendling 6127757920 Propagate debug loc info in ADDC and ADDE.
llvm-svn: 63367
2009-01-30 02:38:00 +00:00