Commit Graph

6600 Commits

Author SHA1 Message Date
Tim Northover 07a8ff4892 ARM64: handle v1i1 types arising from setcc properly.
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.

Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.

Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).

Should fix PR19335.

llvm-svn: 205625
2014-04-04 14:49:21 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Eric Christopher bfb38badc1 Fix for PR 19261:
llc doesn't generate nodes for unconditional fall-through branches for targets
without FastISel implementation (X86 has it, but can be disabled by
"-fast-isel=false") in SelectionDAGBuilder::visitBr().

So for line 4 in the following testcase

1: void foo(int i){
2:   switch(i){
3:   default:
4:     break;
5:   }
6:   return;
7: }

there is no corresponding line in .debug_line section, and a debugger
cannot set a breakpoint at line 4.

Fix this by always emitting a branch when we're not optimizing and add a
testcase to ensure that there's code on every line we'd want to break.

Patch by Daniil Fukalov.

llvm-svn: 205529
2014-04-03 12:11:51 +00:00
Juergen Ributzka fcd2e94ecc Add comments and test case for [DAG] Keep the opaque constant flag when performing unary constant folding operations (r204737).
llvm-svn: 205474
2014-04-02 22:21:01 +00:00
Matt Arsenault e407ae9846 Make isSetCCEquivalent respect the TargetBooleanContents
llvm-svn: 205336
2014-04-01 18:13:26 +00:00
Matt Arsenault 6310c3f667 Add helpers for checking if a value is a target boolean constant.
llvm-svn: 205335
2014-04-01 18:13:22 +00:00
Hal Finkel b811b6d0d1 Add an optional ability to expand larger BUILD_VECTORs with shuffles
This adds the ability to expand large (meaning with more than two unique
defined values) BUILD_VECTOR nodes in terms of SCALAR_TO_VECTOR and (legal)
vector shuffles. There is now no limit of the size we are capable of expanding
this way, although we don't currently do this for vectors with many unique
values because of the default implementation of TLI's
shouldExpandBuildVectorWithShuffles function.

There is currently no functional change to any existing targets because the new
capabilities are not used unless some target overrides the TLI
shouldExpandBuildVectorWithShuffles function. As a result, I've not included a
test case for the new functionality in this commit, but regression tests will
(at least) be added soon when I commit support for the PPC QPX vector
instruction set.

The benefit of committing this now is that it makes the
shouldExpandBuildVectorWithShuffles callback, which had to be added for other
reasons regardless, fully functional. I suspect that other targets will
also benefit from tuning the heuristic.

llvm-svn: 205243
2014-03-31 19:42:55 +00:00
Hal Finkel 1977514287 Add a TLI hook to control when BUILD_VECTOR might be expanded using shuffles
There are two general methods for expanding a BUILD_VECTOR node:
  1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
     them together.
  2. Build the vector on the stack and then load it.

Currently, we use a fixed heuristic: If there are only one or two unique
defined values, then we attempt an expansion in terms of SCALAR_TO_VECTOR and
vector shuffles (provided that the required shuffle mask is legal). Otherwise,
always expand via the stack. Even when SCALAR_TO_VECTOR is not legal, this
can still be a good idea depending on what tricks the target can play when
lowering the resulting shuffle. If the target can't do anything special,
however, and if SCALAR_TO_VECTOR is expanded via the stack, this heuristic
leads to sub-optimal code (two stack loads instead of one).

Because only the target knows whether the SCALAR_TO_VECTORs and shuffles for a
build vector of a particular type are likely to be optimial, this adds a new
TLI function: shouldExpandBuildVectorWithShuffles which takes the vector type
and the count of unique defined values. If this function returns true, then
method (1) will be used, subject to the constraint that all of the necessary
shuffles are legal (as determined by isShuffleMaskLegal). If this function
returns false, then method (2) is always used.

This commit does not enhance the current code to support expanding a
build_vector with more than two unique values using shuffles, but I'll commit
an implementation of the more-general case shortly.

llvm-svn: 205230
2014-03-31 17:48:10 +00:00
Hal Finkel 02807595fb Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT
When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:

  %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
  %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
  %i = add <2 x i32> %b2, <i32 2, i32 3>

If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.

By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.

llvm-svn: 205179
2014-03-31 11:43:19 +00:00
Hal Finkel 90adf0fe06 Make use of previously generated stores in SelectionDAGLegalize::ExpandExtractFromVectorThroughStack
When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
vector and then load the piece we want. This is fine in isolation, but
generating a new store (and corresponding stack slot) for each extraction ends
up producing code of poor quality. When we scalarize a vector operation (using
SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
for each element in the vector. This used to generate one stored copy of the
vector for each element in the vector. Now we search the uses of the vector for
a suitable store before generating a new one, which results in much more
efficient scalarization code.

llvm-svn: 205153
2014-03-30 15:10:18 +00:00
Benjamin Kramer fd719b9551 Avoid storing Twines.
While there nested ifs into a helper function. No functionality change.

llvm-svn: 205108
2014-03-29 16:54:29 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Renato Golin c0a3c1d66b Add @llvm.clear_cache builtin
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.

Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.

A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.

llvm-svn: 204802
2014-03-26 12:52:28 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Juergen Ributzka e2e16844f5 [DAG] Keep the opaque constant flag when performing unary constant folding operations.
Usually opaque constants shouldn't be folded, unless they are simple unary
operations that don't create new constants. Although this shouldn't drop the
opaque constant flag. This commit fixes this.

Related to <rdar://problem/14774662>

llvm-svn: 204737
2014-03-25 18:01:20 +00:00
Matt Arsenault b22426c510 Fix creating illegal setcc cond codes.
If GT/UGT or LT/ULT were set to expand, a comparison
with a constant would replace it with the illegal
cond code.

There are several more places later in this function that
will have the same basic problem.

Theoretically R600 should hit this problem for a test,
but for some reason it doesn't.

llvm-svn: 204727
2014-03-25 16:09:21 +00:00
Tom Stellard c9a67a2b6d SelectionDAG: Allow promotion of SELECT nodes from float to int types
And vice-versa, as long as the types are the same width.

There are a few R600 tests that will cover this.

llvm-svn: 204616
2014-03-24 16:07:28 +00:00
Nuno Lopes 31617266ea remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)

llvm-svn: 204560
2014-03-23 17:09:26 +00:00
Andrea Di Biagio 5b0aacf1c7 [DAG] Fix an assertion failure caused by an invalid cast in method 'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.

Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.

llvm-svn: 204536
2014-03-22 01:47:22 +00:00
Kevin Qin 275ce91243 Fix an assertion caused by using inline asm with indirect register inputs.
llvm-svn: 204425
2014-03-21 02:14:50 +00:00
Raul E. Silvera a9dafe6793 Add support for scalarizing/splitting vector bswap.
Summary:
  SLP Vectorization of intrinsics (r203707) has exposed cases where the
  expansion of vector bswap is failing (PR19151).

Reviewers: hfinkel

CC: chandlerc

Differential Revision: http://llvm-reviews.chandlerc.com/D3104

llvm-svn: 204163
2014-03-18 17:49:12 +00:00
Andrea Di Biagio 28f46d9f39 [DAGCombiner] teach how to simplify xor/and/or nodes according to the following rules:
1)  (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
 2)  (OR  (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR  (A, B), C, Mask)
 3)  (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)

 4)  (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
 5)  (OR  (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR  (A, B), Mask)
 6)  (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)

llvm-svn: 204160
2014-03-18 17:12:59 +00:00
Matt Arsenault 985b9de485 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Adam Nemet 24381f1cb7 [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32.  This is a legal operation on AVX.

For that to work properly, we also need to teach the legalizer about the
specific promotion required here.  The default vector promotion uses
bitcasting to a vector type of the same total size.  We want to promote the
vector element type, effectively widening the operation and then truncating
the result.  This is analogous to the current logic of how int_to_fp is
promoted.

The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType.  This is now shared between
int_to_fp and fp_to_int.

There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86.  It can now go through the new target-independent fp_to_*int promotion
logic.

I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.

Fixes <rdar://problem/16202247>

llvm-svn: 204058
2014-03-17 17:06:14 +00:00
Owen Anderson 16c6bf49b7 Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!

Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.

llvm-svn: 203865
2014-03-13 23:12:04 +00:00
Owen Anderson abb90c9ddb Phase 1 of refactoring the MachineRegisterInfo iterators to make them suitable
for use with C++11 range-based for-loops.

The gist of phase 1 is to remove the skipInstruction() and skipBundle()
methods from these iterators, instead splitting each iterator into a version
that walks operands, a version that walks instructions, and a version that
walks bundles.  This has the result of making some "clever" loops in lib/CodeGen
more verbose, but also makes their iterator invalidation characteristics much
more obvious to the casual reader. (Making them concise again in the future is a
good motivating case for a pre-incrementing range adapter!)

Phase 2 of this undertaking with consist of removing the getOperand() method,
and changing operator*() of the operand-walker to return a MachineOperand&.  At
that point, it should be possible to add range views for them that work as one
might expect.

llvm-svn: 203757
2014-03-13 06:02:25 +00:00
Patrik Hagglund 1da3512166 Replace '#include ValueTypes.h' with forward declarations.
In some cases the include is pushed "downstream" (or removed if
unused).

llvm-svn: 203644
2014-03-12 08:00:24 +00:00
Benjamin Kramer f8502272ef Remove copy ctors that did the same thing as the default one.
The code added nothing but potentially disabled move semantics and made
types non-trivially copyable.

llvm-svn: 203563
2014-03-11 11:32:49 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Matt Arsenault 95b714c749 Fix non 2-space indentation.
llvm-svn: 203514
2014-03-11 00:01:25 +00:00
Chandler Carruth cdf4788401 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Craig Topper 7b883b314c [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203339
2014-03-08 06:31:39 +00:00
Adam Nemet 7f928f14f4 [DAGCombiner] Distribute TRUNC through AND in rotation amount
This is already done for shifts.  Allow it for rotations as well. E.g.:

   (rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))

Use the newly factored-out distributeTruncateThroughAnd.

With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above.  HW implicitly
performs this masking.

The testcase will be added as part of the X86 patch.

llvm-svn: 203316
2014-03-07 23:56:30 +00:00
Adam Nemet 5117f5dffc [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Adam Nemet c6553a8354 [DAGCombiner] Slightly improve readability of matchRotateSub
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.

Better connect the comment which talks about Mask and the code which used
LoBits.  Renamed variable to MaskLoBits.

llvm-svn: 203314
2014-03-07 23:56:24 +00:00
Arnold Schwaighofer d33e942958 ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.

When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.

I ran this over the test suite with i686 and mattr=+sse and saw no regressions.

Fixes PR18036.

llvm-svn: 203311
2014-03-07 23:25:55 +00:00
Andrea Di Biagio 6292a140ee [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Matt Arsenault f9a995d68c R600: Fix extloads from i8 / i16 to i64.
This appears to only be working for global loads. Private
and local break for other reasons.

llvm-svn: 203135
2014-03-06 17:34:12 +00:00
Chandler Carruth 9a4c9e597b [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Chandler Carruth 9205140772 [Layering] Move DebugLoc.h into the IR library. The implementation
already lived there and it is where it belongs -- this is the in-memory
debug location representation.

This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.

llvm-svn: 202960
2014-03-05 10:30:38 +00:00
Andrew Trick fbb278c541 Make stackmap machineinstrs clobber the scratch regs too.
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.

Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense.  Although the calling
convention is not currently used to select the scratch registers.

llvm-svn: 202943
2014-03-05 07:08:16 +00:00
Hans Wennborg 0c72fd2b4e Fix unused variable in FunctionLoweringInfo.cpp
llvm-svn: 202932
2014-03-05 03:21:23 +00:00
Hans Wennborg acb842d523 Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)

In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).

The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.

This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.

Differential Revision: http://llvm-reviews.chandlerc.com/D2954

llvm-svn: 202930
2014-03-05 02:43:26 +00:00
Adam Nemet 67483897a5 [DAGCombiner] Factor out distributeTruncateThroughAnd
Currently this code is duplicated across visitSHL, visitSRA and visitSRL.  The
plan is to add rotates as clients to this new function.

There is no functional change intended here.

llvm-svn: 202908
2014-03-04 23:28:31 +00:00
Chandler Carruth 219b89b987 [Modules] Move CallSite into the IR library where it belogs. It is
abstracting between a CallInst and an InvokeInst, both of which are IR
concepts.

llvm-svn: 202816
2014-03-04 11:01:28 +00:00
Benjamin Kramer d6f1f84f51 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Benjamin Kramer 3a377bce4e Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Hal Finkel ab51ecd4fc Fix visitTRUNCATE for legal i1 values
This extract-and-trunc vector optimization cannot work for i1 values as
currently implemented, and so I'm disabling this for now for i1 values. In the
future, this can be fixed properly.

Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and
this will be covered by one of the existing regression tests.

llvm-svn: 202449
2014-02-28 00:26:45 +00:00
Matt Arsenault b598f7b869 Add missing const
llvm-svn: 202074
2014-02-24 21:01:18 +00:00
Matt Arsenault 58a7639698 Trivial code simplification
llvm-svn: 202073
2014-02-24 21:01:15 +00:00
Quentin Colombet 4db08df18e [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with the
shifted mask rather than masking and shifting separately.

The patch adds this transformation to the DAGCombiner:

  (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)

<rdar://problem/16054492>

Patch by Adam Nemet <anemet@apple.com>

llvm-svn: 201906
2014-02-21 23:42:41 +00:00
Rafael Espindola 5f57f462a8 Rename a few more DataLayout variables from TD to DL.
llvm-svn: 201870
2014-02-21 18:34:28 +00:00
Rafael Espindola ea09c595a6 Rename a DebugLoc variable to DbgLoc and a DataLayout to DL.
This is quiet a bit less confusing now that TargetData was renamed DataLayout.

llvm-svn: 201606
2014-02-18 22:05:46 +00:00
Rafael Espindola 7c68bebb9c Rename some member variables from TD to DL.
TargetData was renamed DataLayout back in r165242.

llvm-svn: 201581
2014-02-18 15:33:12 +00:00
Juergen Ributzka 2b97f9b211 [DAG] Fix the recognition of opaque constants in the SelectionDAGBuilder.
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.

This fixes <rdar://problem/16050719>

llvm-svn: 201291
2014-02-13 04:19:26 +00:00
Juergen Ributzka d1777cc344 [Stackmaps] Improve the stackmap lowering code in the SelectionDAGBuilder.
We are now no longer relying on the target-specific call lowering implementation
to lower a stackmap intrinsic call. Instead we perform the call lowering in a
target-independent way directly in the stackmap lowering code. This simplifies
the code and removes the need to fixup the code after the target-specific call
lowering.

llvm-svn: 201263
2014-02-12 22:17:13 +00:00
Juergen Ributzka aa30da30bb [Stackmaps] Fix the ID type to be i64 also for stackmaps (as we claim in the documenation)
The ID type for the stackmap and patchpoint intrinsics are in both cases i64.
This fixes an zero extend in the SelectionDAGBuilder that still used i32. This
also updates the target independent instructions STACKMAP and PATCHPOINT to use
the correct type.

llvm-svn: 201262
2014-02-12 22:17:10 +00:00
Robert Lougher 7d9084ffa1 Teach the DAGCombiner how to fold concat_vector nodes when the input is two
BUILD_VECTOR nodes, e.g.:

(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)

This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.

llvm-svn: 201158
2014-02-11 15:42:46 +00:00
Juergen Ributzka fa0eba6c8b [DAG] Don't pull the binary operation though the shift if the operands have opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900
2014-02-06 04:09:06 +00:00
Matt Arsenault 1b55dd9a81 Pass address space to allowsUnalignedMemoryAccesses
llvm-svn: 200888
2014-02-05 23:16:05 +00:00
Matt Arsenault 25793a3f22 Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Craig Topper 7ca1d18055 Add CheckChildInteger to ISelMatcher operations. Removes nearly 2000 bytes from X86 matcher table.
llvm-svn: 200821
2014-02-05 05:44:28 +00:00
Hal Finkel 5c968d9440 Expand vector bswap in LegalizeVectorOps
ISD::BSWAP was missing from the list of node types that should be expanded
element-wise.

llvm-svn: 200705
2014-02-03 17:27:25 +00:00
Reid Kleckner f5b76518c9 Implement inalloca codegen for x86 with the new inalloca design
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.

Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work.  As a result these changes are pretty
minimal.

Reviewers: echristo

Differential Revision: http://llvm-reviews.chandlerc.com/D2637

llvm-svn: 200596
2014-01-31 23:50:57 +00:00
Reid Kleckner dfbed59cc2 Don't put non-static allocas in the static alloca map
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block.  Also add an
assertion in x86 fastisel.

llvm-svn: 200593
2014-01-31 23:45:12 +00:00
Manman Ren 413a6cb42b This patch teaches the DAGCombiner how to fold insert_subvector nodes
when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506
2014-01-31 01:10:35 +00:00
Owen Anderson 60a4678c42 DAGCombine should not produce ISD::OR nodes after operation legalization if they're not legal.
llvm-svn: 200503
2014-01-31 00:51:43 +00:00
Manman Ren 4ece7452ba PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".

Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.

llvm-svn: 200502
2014-01-31 00:42:44 +00:00
Manman Ren 7407e0e31c Revert r200431 due to bot failures.
llvm-svn: 200434
2014-01-30 00:53:27 +00:00
Manman Ren 104e0c80cc PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

llvm-svn: 200431
2014-01-30 00:24:37 +00:00
Andrea Di Biagio b6d39afbda [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend.
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.

This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.

Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.

llvm-svn: 200313
2014-01-28 12:53:56 +00:00
Juergen Ributzka 659ce00d60 [TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself.
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.

Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.

llvm-svn: 200271
2014-01-28 01:20:14 +00:00
Matt Arsenault 5f2a92a26c Fix sext(setcc) -> select_cc using wrong type for setcc.
Also update the comment, since it actually produces a
select (setcc) instead of select_cc.

It was checking and using the setcc result type for the
type of the sext, instead of the type of the compared items.

In my problem case, the sext was to i32 and was used as the setcc type,
but the expected type was i64.

No test since I haven't been able to hit the problem with
this on any in-tree targets.

llvm-svn: 200249
2014-01-27 21:41:54 +00:00
Andrea Di Biagio f09a357765 [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).

The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.

Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.

llvm-svn: 200234
2014-01-27 18:45:30 +00:00
Stepan Dyatkovskiy 157bb42e27 Fix for PR18102.
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.

Consider, how MergeConsequtiveStores works for next example:

store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1]   ; a[1] again.
return   ; DAG starts here

1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.

The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.

So after merging we have the next result:

store i16 (1 | 3 << 8), base   ; is a[0] but bit-casted to i16
store i8 2, a[1]

So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].

Fix: In sort routine just also take into account mem-op sequence number. 
llvm-svn: 200201
2014-01-27 09:18:31 +00:00
Kevin Qin fb9871ff50 [AArch64 NEON] Fix pattern match failed on FP_ROUND from v1f128 to v1f64.
llvm-svn: 200109
2014-01-26 02:19:35 +00:00
Hal Finkel dbebb52a2f Disable the use of TBAA when using AA in CodeGen
There are currently two issues, of which I currently know, that prevent TBAA
from being correctly usable in CodeGen:

  1. Stack coloring does not update TBAA when merging allocas. This is easy
     enough to fix, but is not the largest problem.

  2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations.
     Because BasicAA does not handle inttoptr, we'll often miss basic type punning
     idioms that we need to catch so we don't miscompile real-world code (like LLVM).

I don't yet have a small test case for this, but this fixes self hosting a
non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle.

llvm-svn: 200093
2014-01-25 19:24:54 +00:00
Hal Finkel 9b2617a5a8 Add combiner-aa-only-func (debug only)
This option (which is !NDEBUG only) allows restricting the use of alias
analysis in DAGCombiner to a specific function. This has proved extremely
valuable to isolating bugs related to this feature, and mirrors the
misched-only-func option provided by the new instruction scheduler.

llvm-svn: 200088
2014-01-25 17:32:39 +00:00
Hal Finkel 5fb07341f1 Improve descriptions of combiner-alias-analysis and combiner-global-alias-analysis
llvm-svn: 200087
2014-01-25 17:32:37 +00:00
Juergen Ributzka f26beda7c7 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062
2014-01-25 02:02:55 +00:00
Hans Wennborg 4d67a2e85a Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

llvm-svn: 200058
2014-01-25 01:18:18 +00:00
Juergen Ributzka 4f3df4ad64 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

llvm-svn: 200034
2014-01-24 20:18:00 +00:00
Hal Finkel 51a9838049 Fix DAGCombiner::GatherAllAliases to account for non-chain dependencies
DAGCombiner::GatherAllAliases, which is only used when AA used is enabled
during DAGCombine, had a fundamentally incorrect assumption for which this
change compensates. GatherAllAliases, which is used to find aliasing
predecessor chain nodes (so that a better chain can be selected for a load or
store to enable subsequent optimizations) assumed that walking up the chain
would always catch all possibly-aliasing loads and stores. This is not true: To
really find all aliases, we also need to search for aliases through the value
operand of a store, etc.  Consider the following situation:

  Token1 = ...
  L1 = load Token1, %52
  S1 = store Token1, L1, %51
  L2 = load Token1, %52+8
  S2 = store Token1, L2, %51+8
  Token2 = Token(S1, S2)
  L3 = load Token2, %53
  S3 = store Token2, L3, %52
  L4 = load Token2, %53+8
  S4 = store Token2, L4, %52+8

If we search for aliases of S3 (which loads address %52), and we look only
through the chain, then we'll miss the trivial dependence on L1 (which loads
from %52). We then might change all loads and stores to use Token1 as their
chain operand, which could result in copying %53 into %52 before copying
%52 into %51 (which should happen first).

The problem is, however, that searching for such data dependencies can become
expensive, and the cost is not directly related to the chain depth. Instead,
we'll rule out such configurations by insisting that we've visited all chain
users (except for users of the original chain, which is not necessary).  When
doing this, we need to look through nodes we don't care about (otherwise,
things like register copies will interfere with trivial use cases).

Unfortunately, I don't have a small test case for this problem. Creating the
underlying situation is not hard (a pair of memcpys will do it), but arranging
for the default instruction schedule to be incorrect is very fragile.

This unbreaks self hosting on PPC64 when using
-mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis.

llvm-svn: 200033
2014-01-24 20:12:02 +00:00
Juergen Ributzka 50e7e80d00 Revert "Add Constant Hoisting Pass"
This reverts commit r200022 to unbreak the build bots.

llvm-svn: 200024
2014-01-24 18:40:30 +00:00
Hal Finkel ccc18e1330 Restrict FindBetterChain DAG combines to unindexed nodes
These transformations obviously won't work for indexed (pre/post-inc) loads and
stores. In practice, I'm not sure there is any benefit to enabling them for
indexed nodes because other transformations that these might enable likely also
won't handle indexed nodes.

I don't have an in-tree test case that hits this problem, but an upcoming bug
fix will make it much more likely.

llvm-svn: 200023
2014-01-24 18:25:26 +00:00
Juergen Ributzka 38b67d0caf Add Constant Hoisting Pass
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.

First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.

If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.

When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.

This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)

Reviewed by Eric

llvm-svn: 200022
2014-01-24 18:23:08 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Owen Anderson 77e4d44411 Revert r162101 and replace it with a solution that works for targets where the pointer type is illegal.
This is a horrible bit of code.  We're calling a simplification routine *in the middle* of type legalization.  We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal!  The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.

llvm-svn: 199847
2014-01-22 22:34:17 +00:00
Elena Demikhovsky 9d56f1e0e5 AVX512: combining setcc and zext is wrong on AVX512
because vector compare instruction puts result in mask register.

llvm-svn: 199798
2014-01-22 12:26:19 +00:00
Owen Anderson fb00d5bc7c Allow SMUL_LOHI and UMUL_LOHI to be narrow to MUL on targets where MUL is Custom rather than Legal. Even if the target is doing some kind of expansion for MUL, it's pretty much guaranteed to be more efficent than whatever it does for SMUL_LOHI or UMUL_LOHI!
llvm-svn: 199678
2014-01-20 18:41:34 +00:00
Andrea Di Biagio d7c03ec348 [DAGCombiner] Fix a wrong check in method SimplifyVBinOp.
This fixes a regression intruced by r199135.

Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().

However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.

After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.

This fixes the problem restoring the old behavior of SimplifyVBinOp.

llvm-svn: 199328
2014-01-15 19:51:32 +00:00
Jakob Stoklund Olesen b6b35a4955 Always let value types influence register classes.
When creating a virtual register for a def, the value type should be
used to pick the register class. If we only use the register class
constraint on the instruction, we might pick a too large register class.

Some registers can store values of different sizes. For example, the x86
xmm registers can hold f32, f64, and 128-bit vectors. The three
different value sizes are represented by register classes with identical
register sets: FR32, FR64, and VR128. These register classes have
different spill slot sizes, so it is important to use the right one.

The register class constraint on an instruction doesn't necessarily care
about the size of the value its defining. The value type determines
that.

This fixes a problem where InstrEmitter was picking 32-bit register
classes for 64-bit values on SPARC.

llvm-svn: 199187
2014-01-14 06:18:38 +00:00
Juergen Ributzka 6840282c99 [DAG] Refactor ReassociateOps - no functional change intended.
llvm-svn: 199146
2014-01-13 21:49:25 +00:00
Juergen Ributzka 7384405f23 [DAG] Teach DAG to also reassociate vector operations
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.

Reviewed by Andrea Di Biagio

llvm-svn: 199135
2014-01-13 20:51:35 +00:00
Andrew Trick 7daf6a45f4 Hide the pre-RA-sched= option.
This is a very confusing option for a feature that will go away.

-enable-misched is exposed instead to help triage issues with the new
scheduler.

llvm-svn: 199133
2014-01-13 20:08:27 +00:00
Nico Rieck b5262d6d8f Fix non-deterministic SDNodeOrder-dependent codegen
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
2014-01-12 14:09:17 +00:00
Alp Toker 798060e006 Fix 'ned' typo in doc comment
Patch by Jasper Neumann!

llvm-svn: 199007
2014-01-11 14:01:43 +00:00
Richard Sandiford 15cfc1c33c Handle masked rotate amounts
At the moment we expect rotates to have the form:

   (or (shl X, Y), (shr X, Z))

where Y == bitsize(X) - Z or Z == bitsize(X) - Y.  This form means that
the (or ...) is undefined for Y == 0 or Z == 0.  This undefinedness can
be avoided by using Y == (C * bitsize(X) - Z) & (bitsize(X) - 1) or
Z == (C * bitsize(X) - Y) & (bitsize(X) - 1) for any integer C
(including 0, the most natural choice).

llvm-svn: 198861
2014-01-09 10:56:42 +00:00
Richard Sandiford 0f264db3c6 Match the InstCombine form of rotates by X+C
InstCombine converts (sub 32, (add X, C)) into (sub 32-C, X),
so a rotate left of a 32-bit Y by X+C could appear as either:

   (or (shl Y, (add X, C)), (shr Y, (sub 32, (add X, C))))

without InstCombine or:

   (or (shl Y, (add X, C)), (shr Y, (sub 32-C, X)))

with it.

We already matched the first form.  This patch handles the second too.

llvm-svn: 198860
2014-01-09 10:49:40 +00:00
Chandler Carruth d48cdbf0c3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Andrea Di Biagio 23df4e4a2d Teach the DAGCombiner how to fold 'vselect' dag nodes according
to the following two rules:
  1) fold (vselect (build_vector AllOnes), A, B) -> A
  2) fold (vselect (build_vector AllZeros), A, B) -> B

llvm-svn: 198777
2014-01-08 18:33:04 +00:00
Richard Sandiford 95c864d9bd [DAGCombiner] Factor duplicated rotate code into a separate function
No functional change intended.

llvm-svn: 198768
2014-01-08 15:40:47 +00:00
Chandler Carruth 9aca918df9 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Kevin Qin 5cd73c9e0a [AArch64 NEON] Fix invalid constant used in vselect condition.
There is a wrong assumption that the vector element type and the
type of each ConstantSDNode in the build_vector were the same.
However, when promoting the integer operand of a legally typed
build_vector, the operand type and the vector element type do not
need to be the same
(See method 'DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR' in
LegalizeIntegerTypes.cpp).

  in AArch64 backend, the following dag sequence:

  C0: i1 = Constant<0>
  C1: i1 = Constant<-1>
  V: v8i1 = BUILD_VECTOR C1, C1, C0, C0, C0, C0, C0, C0

  is type-legalized into:

  NewC0: i32 = Constant<0>
  NewC1: i32 = Constant<1>
  V: v8i8 = BUILD_VECTOR NewC1, NewC1, NewC0, NewC0, NewC0, NewC0, NewC0, NewC0

Forcing a getZeroExtend to VTBits to ensure that the new constant
is correctly.

llvm-svn: 198582
2014-01-06 02:26:10 +00:00
Bill Wendling 908bf814e7 Refactor function that checks that __builtin_returnaddress's argument is constant.
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.

llvm-svn: 198579
2014-01-06 00:43:20 +00:00
Kevin Qin ede9ce1933 Fix a bug in DAGcombiner about zero-extend after setcc.
For AArch64 backend, if DAGCombiner see "sext(setcc)", it will
combine them together to a single setcc with extended value type.
Then if it see "zext(setcc)", it assumes setcc is Vxi1, and try to
create "(and (vsetcc), (1, 1, ...)". While setcc isn't Vxi1,
DAGcombiner will create wrong node and get wrong code emitted.

llvm-svn: 198190
2013-12-30 02:05:13 +00:00
Andrea Di Biagio 46dcddb350 Teach DAGCombiner how to fold a SIGN_EXTEND_INREG of a BUILD_VECTOR of
ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.

For example, given the following sequence of dag nodes:

  i32 C = Constant<1>
  v4i32 V = BUILD_VECTOR C, C, C, C
  v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1

The SIGN_EXTEND_INREG node can be folded into a build_vector since
the vector in input is a BUILD_VECTOR of constants.

The optimized sequence is:

  i32 C = Constant<-1>
  v4i32 Result = BUILD_VECTOR C, C, C, C

llvm-svn: 198084
2013-12-27 20:20:28 +00:00
Josh Magee 22b8ba2d67 [stackprotector] Use analysis from the StackProtector pass for stack layout in PEI a nd LocalStackSlot passes.
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).

The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally.  After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.

Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument. 

The next patch will implement the rules for sspstrong and sspreq.  The end goal
is to support ssp-strong stack layout rules.

WIP.

Differential Revision: http://llvm-reviews.chandlerc.com/D2158

llvm-svn: 197653
2013-12-19 03:17:11 +00:00
Jim Grosbach ea2db453dd Add a machine code print in DEBUG() following instruction selection.
Make debugging ISel a bit easier by printing out a dump of the generated
code at the end.

llvm-svn: 197456
2013-12-17 02:01:10 +00:00
Juergen Ributzka e82947539e [Stackmap] Liveness Analysis Pass
This optional register liveness analysis pass can be enabled with either
-enable-stackmap-liveness, -enable-patchpoint-liveness, or both. The pass
traverses each basic block in a machine function. For each basic block the
instructions are processed in reversed order and if a patchpoint or stackmap
instruction is encountered the current live-out register set is encoded as a
register mask and attached to the instruction.

Later on during stackmap generation the live-out register mask is processed and
also emitted as part of the stackmap.

This information is optional and intended for optimization purposes only. This
will enable a client of the stackmap to reason about the registers it can use
and which registers need to be preserved.

Reviewed by Andy

llvm-svn: 197317
2013-12-14 06:53:06 +00:00
Andrew Trick 7bcb0100df Revert "Liveness Analysis Pass"
This reverts commit r197254.

This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.

Conflicts:
	include/llvm/CodeGen/StackMaps.h
	lib/CodeGen/StackMaps.cpp
	test/CodeGen/X86/stackmap-liveness.ll

llvm-svn: 197260
2013-12-13 18:57:20 +00:00
Andrew Trick e8cba373a3 Grow the stackmap/patchpoint format to hold 64-bit IDs.
llvm-svn: 197255
2013-12-13 18:37:10 +00:00
Andrew Trick 8d6a658430 Liveness Analysis Pass
llvm-svn: 197254
2013-12-13 18:37:03 +00:00
Benjamin Kramer 671a596282 SelectionDAG: Fix a typo.
Found by "cppcheck". PR18208.

llvm-svn: 197047
2013-12-11 16:36:09 +00:00
Richard Sandiford d1093636cc Extend (truncate (load)) folding
DAGCombiner could fold (truncate (load)) -> smaller load if the original
load was the width of the truncation result or wider.  This patch extends
it to handle cases where the original load was narrower (and so the
extension type stays the same).

llvm-svn: 197030
2013-12-11 11:37:27 +00:00
Reid Kleckner ee08897fb8 Reland "Fix miscompile of MS inline assembly with stack realignment"
This re-lands commit r196876, which was reverted in r196879.

The tests have been fixed to pass on platforms with a stack alignment
larger than 4.

Update to clang side tests will land shortly.

llvm-svn: 196939
2013-12-10 18:27:32 +00:00
Richard Sandiford 9afe613d12 Add TargetLowering::prepareVolatileOrAtomicLoad
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.

Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.

The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences.  It is a no-op for targets other than SystemZ.

llvm-svn: 196905
2013-12-10 10:36:34 +00:00
Reid Kleckner 0a9509f080 Revert "Fix miscompile of MS inline assembly with stack realignment"
This reverts commit r196876.  Its tests failed on the bots, so I'll
figure it out tomorrow.

llvm-svn: 196879
2013-12-10 05:31:27 +00:00
Reid Kleckner 7f10a8cd45 Fix miscompile of MS inline assembly with stack realignment
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments

We would use esi unconditionally without verifying that it did not
conflict with inline assembly.

This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.

Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp.  Instead, we analyze the inline
instructions for implicit definitions and check if esp is there.  If so,
we require the use of a base pointer and consider it in the condition
above.

Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.

Reviewers: sunfish

Differential Revision: http://llvm-reviews.chandlerc.com/D1317

llvm-svn: 196876
2013-12-10 05:12:23 +00:00
Nadav Rotem 6eee080450 Fix PR18162 - Incorrect assertion assumed that the SDValue resno is zero.
llvm-svn: 196858
2013-12-10 01:13:59 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Rafael Espindola d50dbc783b Try harder to get a consistent floating point results.
This just extends the existing hack. It should be enough to get a reproducible bootstrap
on 32 bits.

I will open a bug to track getting a real fix for this.

llvm-svn: 196462
2013-12-05 04:14:33 +00:00
Andrew Trick 391dbadb51 StackMap: Implement support for DirectMemRefOp.
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)

Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.

llvm-svn: 195712
2013-11-26 02:03:25 +00:00
Bill Wendling 9200bb08f9 Unrevert r195599 with testcase fix.
I'm not sure how it was checking for the wrong values...
PR18023.

llvm-svn: 195670
2013-11-25 18:05:22 +00:00
Amara Emerson f59125f5bb Revert r195599 as it broke the builds.
llvm-svn: 195636
2013-11-25 11:24:18 +00:00
Daniel Sanders b021c6fdbd Fixed tryFoldToZero() for vector types that need expansion.
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.

Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.

Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.

Reviewers: resistor

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2251

llvm-svn: 195635
2013-11-25 11:14:43 +00:00
Bill Wendling e3c48709ed Don't look past volatile loads.
A volatile load should block us from trying to coalesce stores.
PR18023

llvm-svn: 195599
2013-11-25 05:01:21 +00:00
Paul Robinson d89125a5d8 Teach ISel not to optimize 'optnone' functions (revised).
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
  SelectAllBasicBlocks; the flag is checked in various places, and
  FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
  something more subtle that doesn't work everywhere.

Based on work by Andrea Di Biagio.

llvm-svn: 195491
2013-11-22 19:11:24 +00:00
Andrew Trick 4a1abb7ab5 patchpoint: factor SD builder code for live vars. Plain stackmap also optimizes Constant values now.
llvm-svn: 195488
2013-11-22 19:07:36 +00:00
Andrew Trick a2428e0f40 patchpoint: eliminate hard coded operand indices.
llvm-svn: 195487
2013-11-22 19:07:33 +00:00
Tom Stellard 06c67bcbe4 SelectionDAG: Optimize expansion of vec_type = BITCAST scalar_type
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195398
2013-11-22 00:41:05 +00:00
Tom Stellard 9cbd2c5581 Split SETCC if VSELECT requires splitting too.
This patch is a rewrite of the original patch commited in r194542. Instead of
relying on the type legalizer to do the splitting for us, we now peform the
splitting ourselves in the DAG combiner. This is necessary for the case where
the vector mask is a legal type after promotion and still wouldn't require
splitting.

Patch by: Juergen Ributzka

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195397
2013-11-22 00:39:23 +00:00
Daniel Sanders edc071b815 Add support for legalizing SETNE/SETEQ by inverting the condition code and the result of the comparison.
Summary:
LegalizeSetCCCondCode can now legalize SETEQ and SETNE by returning the inverse
condition and requesting that the caller invert the result of the condition.

The caller of LegalizeSetCCCondCode must handle the inverted CC, and they do
so as follows:
  SETCC, BR_CC:
    Invert the result of the SETCC with SelectionDAG::getNOT()
  SELECT_CC:
    Swap the true/false operands.

This is necessary for MSA which lacks an integer SETNE instruction.

Reviewers: resistor

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2229

llvm-svn: 195355
2013-11-21 13:24:49 +00:00
NAKAMURA Takumi 43aa939625 Revert r195317 (and r195333), "Teach ISel not to optimize 'optnone' functions."
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".

FYI, it didn't appear to add either "-O0" or "-fast-isel".

llvm-svn: 195339
2013-11-21 10:55:15 +00:00
Paul Robinson b379efeb53 Teach ISel not to optimize 'optnone' functions.
Based on work by Andrea Di Biagio.

llvm-svn: 195317
2013-11-21 06:33:32 +00:00
Jack Carter d4b22dcbf3 long line correction
llvm-svn: 195179
2013-11-20 00:32:32 +00:00
Jack Carter 5c0af48a11 long lines and white space correction
llvm-svn: 195170
2013-11-19 23:43:22 +00:00
Juergen Ributzka b34871027f [DAG] Refactor vector splitting code in SelectionDAG. No functional change intended.
Reviewed by Tom

llvm-svn: 195156
2013-11-19 21:20:17 +00:00
Andrew Trick 1f54e805f2 Fix patchpoint comments.
llvm-svn: 195103
2013-11-19 05:05:43 +00:00
Benjamin Kramer bb1dd73d3e DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants.
llvm-svn: 194959
2013-11-17 10:40:03 +00:00
Matt Arsenault 64283bd99c Use more getZExtOrTruncs
llvm-svn: 194945
2013-11-17 02:31:26 +00:00
Matt Arsenault 873bb3ea86 Use getZExtOrTrunc instead of repeating the same logic.
llvm-svn: 194944
2013-11-17 02:24:21 +00:00
Matt Arsenault 36f5eb5949 Use right address space pointer size
llvm-svn: 194940
2013-11-17 00:06:39 +00:00
Matt Arsenault dfb3e7092e Fix assert on unaligned access to global with different address space size.
llvm-svn: 194934
2013-11-16 20:50:54 +00:00
Matt Arsenault 19231e630e Fix codegen for null different sized pointer.
llvm-svn: 194932
2013-11-16 20:24:41 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Daniel Sanders 50b8041066 Fix illegal DAG produced by SelectionDAG::getConstant() for v2i64 type
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.

In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for.  For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)

This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.

lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.

For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
  that the MIPS tests were falsely passing because a polymorphic function was
  not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
  handle the MIPS MSA element-order (which was previously doing an byte-order
  swap instead of an element-order swap). This left
  isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
  failures involved the bset, bneg, and bclr instructions added in these commits
  using lowerMSASplatImm() in a way that was no longer valid after this patch.
  Some of these were fixed by calling SelectionDAG::getConstant() instead,
  others were fixed by a new function getBuildVectorSplat() that provided the
  removed functionality of lowerMSASplatImm() in a more sensible way.

Reviewers: bkramer

Reviewed By: bkramer

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1973

llvm-svn: 194811
2013-11-15 12:56:49 +00:00
Matt Arsenault c5559bb14b Add target hook to prevent folding some bitcasted loads.
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)

On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.

Patch by Micah Villmow.

llvm-svn: 194783
2013-11-15 04:42:23 +00:00
Matt Arsenault b03bd4d96b Add addrspacecast instruction.
Patch by Michele Scandale!

llvm-svn: 194760
2013-11-15 01:34:59 +00:00
Andrew Trick 561f2218e0 Minor extension to llvm.experimental.patchpoint: don't require a call.
If a null call target is provided, don't emit a dummy call. This
allows the runtime to reserve as little nop space as it needs without
the requirement of emitting a call.

llvm-svn: 194676
2013-11-14 06:54:10 +00:00
Juergen Ributzka 34c652d34d SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.

The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 194542
2013-11-13 01:57:54 +00:00
Daniel Sanders a1840d2f88 Vector forms of SHL, SRA, and SRL can be constant folded using SimplifyVBinOp too
Reviewers: dsanders

Reviewed By: dsanders

CC: llvm-commits, nadav

Differential Revision: http://llvm-reviews.chandlerc.com/D1958

llvm-svn: 194393
2013-11-11 17:23:41 +00:00
Juergen Ributzka 87ed906b2e [Stackmap] Materialize the jump address within the patchpoint noop slide.
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.

Differential Revision: http://llvm-reviews.chandlerc.com/D2074

Reviewed by Andy

llvm-svn: 194306
2013-11-09 01:51:33 +00:00
Juergen Ributzka 9969d3e6e8 [Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).

Differential Revision: http://llvm-reviews.chandlerc.com/D2009

Reviewed by Andy

llvm-svn: 194293
2013-11-08 23:28:16 +00:00
Andrew Trick 6664df12fb Slightly change the way stackmap and patchpoint intrinsics are lowered.
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.

My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.

llvm-svn: 194102
2013-11-05 22:44:04 +00:00
Juergen Ributzka 359c532d36 [Stackmap] Remove erroneous assert.
llvm-svn: 193871
2013-11-01 17:53:27 +00:00
Aaron Ballman 2b7a733b16 Commenting out this assert because it is causing the build bots to fail. This effectively reverts r193861, but needs to be fixed as part of r193769.
llvm-svn: 193862
2013-11-01 15:12:23 +00:00
Aaron Ballman 96321aa523 Fixing an order of evaluation error in an assert.
llvm-svn: 193861
2013-11-01 14:53:14 +00:00
Andrew Trick 153ebe6d2a Add support for stack map generation in the X86 backend.
Originally implemented by Lang Hames.

llvm-svn: 193811
2013-10-31 22:11:56 +00:00
Andrew Trick 74f4c749cf Lower stackmap intrinsics directly to their target opcode in the DAG builder.
llvm-svn: 193769
2013-10-31 17:18:24 +00:00
Andrew Trick d4d1d9c06e whitespace
llvm-svn: 193765
2013-10-31 17:18:07 +00:00
Jim Grosbach 7236678687 Legalize: Improve legalization of long vector extends.
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
  %tmp = zext <16 x i8> %a to <16 x i32>
  store <16 x i32> %tmp, <16 x i32>*%p
  ret void
}

Generates:
  .section  __TEXT,__text,regular,pure_instructions
  .section  __TEXT,__const
  .align  5
LCPI0_0:
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _bar
  .align  4, 0x90
_bar:
  vpunpckhbw  %xmm0, %xmm0, %xmm1
  vpunpckhwd  %xmm0, %xmm1, %xmm2
  vpmovzxwd %xmm1, %xmm1
  vinsertf128 $1, %xmm2, %ymm1, %ymm1
  vmovaps LCPI0_0(%rip), %ymm2
  vandps  %ymm2, %ymm1, %ymm1
  vpmovzxbw %xmm0, %xmm3
  vpunpckhwd  %xmm0, %xmm3, %xmm3
  vpmovzxbd %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vandps  %ymm2, %ymm0, %ymm0
  vmovaps %ymm0, (%rdi)
  vmovaps %ymm1, 32(%rdi)
  vzeroupper
  ret

So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
  - the number of vector elements is even,
  - the source type is legal,
  - the type of a split source is illegal,
  - the type of an extended (by doubling element size) source is legal, and
  - the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."

With this change, the above example now compiles to:
_bar:
  vpxor %xmm1, %xmm1, %xmm1
  vpunpcklbw  %xmm1, %xmm0, %xmm2
  vpunpckhwd  %xmm1, %xmm2, %xmm3
  vpunpcklwd  %xmm1, %xmm2, %xmm2
  vinsertf128 $1, %xmm3, %ymm2, %ymm2
  vpunpckhbw  %xmm1, %xmm0, %xmm0
  vpunpckhwd  %xmm1, %xmm0, %xmm3
  vpunpcklwd  %xmm1, %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vmovaps %ymm0, 32(%rdi)
  vmovaps %ymm2, (%rdi)
  vzeroupper
  ret

This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.

rdar://14735100

llvm-svn: 193727
2013-10-31 00:20:48 +00:00
Matt Arsenault 2ba54c3d90 Fix CodeGen for unaligned loads with address spaces
llvm-svn: 193721
2013-10-30 23:30:05 +00:00
Juergen Ributzka 3bd686d493 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
Now Hexagon and SystemZ are not happy with it :-(

llvm-svn: 193677
2013-10-30 06:36:19 +00:00
Juergen Ributzka 6ad05d6b95 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 193676
2013-10-30 05:48:18 +00:00
Alp Toker 6a03374526 Fix "existant" typos
llvm-svn: 193579
2013-10-29 02:35:28 +00:00
Richard Sandiford 981fdeb477 [DAGCombiner] Respect volatility when checking for aliases
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses.  This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.

llvm-svn: 193518
2013-10-28 12:00:00 +00:00
Richard Sandiford 39c1ce4dc1 Keep TBAA info when rewriting SelectionDAG loads and stores
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one).  This patch tries to catch all cases where
the TBAA information can legitimately be carried over.

The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields.  (The corresponding
getTruncStore() already exists.)  The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info).  If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.

llvm-svn: 193517
2013-10-28 11:17:59 +00:00
Tim Northover a564d329c2 LegalizeDAG: allow libcalls for max/min atomic operations
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.

The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.

llvm-svn: 193398
2013-10-25 09:30:20 +00:00
Nadav Rotem d369d4bdf9 Optimize concat_vectors(X, undef) -> scalar_to_vector(X).
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.

llvm-svn: 193393
2013-10-25 06:41:18 +00:00
Tom Stellard 8d7d4deafe SelectionDAG: Pass along the original argument/element type in ISD::InputArg
For some targets, it is useful to be able to look at the original
type of an argument without having to dig through the original IR.

This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset
was not taking into account the offset of structure elements.

Patch by: Justin Holewinski

Tom Stellard:
  - Changed the type of ArgVT to EVT, so it can store non-simple types
    like v3i32.

llvm-svn: 193214
2013-10-23 00:44:24 +00:00
Wan Xiaofei 2f8dc08b8c Using FoldingSet in SelectionDAG::getVTList.
VTList has a long life cycle through the module and getVTList is frequently called. In current getVTList, sequential search over a std::vector is used, this is inefficient in big module.
This patch use FoldingSet to implement hashing mechanism when searching.

Reviewer: Nadav Rotem
Test    : Pass unit tests & LNT test suite

llvm-svn: 193150
2013-10-22 08:02:02 +00:00
Matt Arsenault b768912db8 Fix CodeGen for different size address space GEPs
llvm-svn: 193111
2013-10-21 20:03:54 +00:00
Matt Arsenault bbd24901cf Reuse variable
llvm-svn: 193107
2013-10-21 19:24:15 +00:00
Bill Schmidt 3684fdd59f [PATCH] Fix PR17168 (DAG scheduler inserts DBG_VALUE before PHI with fast-isel)
PR17168 describes a test case that fails when compiling for debug with
fast-isel.  Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.

For this problem to occur requires the following:
 * Compile for debug
 * Compile with fast-isel
 * In a block B, fast-isel must partially succeed before punting to DAG-isel
 * B must start with a PHI
 * The first unhandled node in the DAG must not generate a machine instruction
 * A debug value with an order less than that of that first node exists

When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire.  Currently it tests whether the block is empty,
or whether the last instruction generated is a phi.  When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi.  This patch adds that check, and adds the test case from the
PR as a regression test.

llvm-svn: 192976
2013-10-18 14:20:11 +00:00
David Majnemer 451b7dd1ef CodeGen: Emit a libcall if the target doesn't support 16-byte wide atomics
There are targets that support i128 sized scalars but cannot emit
instructions that modify them directly.  The proper thing to do is to
emit a libcall.

This fixes PR17481.

llvm-svn: 192957
2013-10-18 08:03:43 +00:00
Richard Sandiford 95f7ba988b Replace sra with srl if a single sign bit is required
E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2).

llvm-svn: 192884
2013-10-17 11:16:57 +00:00
Andrea Di Biagio 561badf717 Fix edge condition in DAGCombiner to improve codegen of shift sequences.
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))

remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.

llvm-svn: 192883
2013-10-17 11:02:58 +00:00
Jack Carter d4e9615d1c [projects/test-suite] White space and long line fixes.
No functionality changes.

llvm-svn: 192863
2013-10-17 01:34:33 +00:00
Benjamin Kramer 00eb07b791 DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal constant.
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.

llvm-svn: 192795
2013-10-16 14:16:19 +00:00
Richard Sandiford 374a0e50c4 Handle (shl (anyext (shr ...))) in SimpilfyDemandedBits
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.

The motivating examples are the first two in the testcase, which occur
in llvmpipe output.

llvm-svn: 192783
2013-10-16 10:26:19 +00:00
David Blaikie 6004dbc9fa Fix indenting.
That wasn't confusing /at all/...

llvm-svn: 192617
2013-10-14 20:15:04 +00:00
Elena Demikhovsky 82a46ebe0a Fixed a bug in dynamic allocation memory on stack.
The alignment of allocated space was wrong, see Bugzila 17345.

Done by Zvi Rackover <zvi.rackover@intel.com>.

llvm-svn: 192573
2013-10-14 07:26:51 +00:00
Will Dietz ae726a93e3 TargetLowering: Don't index into empty string.
(This is triggered by current lit tests)

llvm-svn: 192549
2013-10-13 03:08:49 +00:00
Quentin Colombet de0e06234c [DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support.
This should fix the buildbots.

Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32

into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192476
2013-10-11 18:29:42 +00:00
Quentin Colombet 5aee63d9e3 [DAGCombiner] Revert load slicing (r192471), until I figure out why it fails on ubuntu.
llvm-svn: 192474
2013-10-11 18:17:17 +00:00
Quentin Colombet 41dc258f71 [DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
 a = load i64* addr
 b = trunc i64 a to i32
 c = lshr i64 a, 32
 d = trunc i64 c to i32

into:
 b = load i32* addr1
 d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192471
2013-10-11 18:01:14 +00:00
Matt Arsenault a98c3b1816 Use getPointerSizeInBits() rather than 8 * getPointerSize()
llvm-svn: 192386
2013-10-10 19:09:05 +00:00
Craig Topper a7afa71494 Fix some assert messages to say the correct opcode name. Looks like one assert got copy and pasted to many places.
llvm-svn: 192078
2013-10-06 22:38:19 +00:00
Craig Topper a1bbc323fa Add OPC_CheckChildSame0-3 to the DAG isel matcher. This replaces sequences of MoveChild, CheckSame, MoveParent. Saves 846 bytes from the X86 DAG isel matcher, ~300 from ARM, ~840 from Hexagon.
llvm-svn: 192026
2013-10-05 05:38:16 +00:00
Adrian Prantl f01b562a15 Debug info: Don't crash in SelectionDAGISel when a vreg that is being
pointed to by a dbg_value belonging to a function argument is eliminated
during instruction selection.
rdar://problem/15094721.

llvm-svn: 192011
2013-10-05 00:08:27 +00:00
Hal Finkel dbc7a8a8a3 Fix DAGCombiner::visitFP_EXTEND to ignore indexed loads
DAGCombiner::visitFP_EXTEND will apply the following transformation:

  fold (fpext (load x)) -> (fpext (fptrunc (extload x)))

but the implementation does not handle indexed loads (pre/post inc.), but did
not specifically ignore them either (unlike for extending loads, which it
already ignored), causing an assert when the transformation was applied to an
indexed load. This is the minimal fix for correctness (causing the
transformation to be skipped for indexed loads).

Unfortunately, I don't have an in-tree test case.

llvm-svn: 191989
2013-10-04 22:18:12 +00:00
Craig Topper d9a6cc031d Revert r191940 to see if it fixes the build bots.
llvm-svn: 191941
2013-10-04 05:52:17 +00:00
Craig Topper a2efe9ebc6 Add OPC_CheckChildSame0-3 to the DAG isel matcher. This replaces sequences of MoveChild, CheckSame, MoveParent. Saves 846 bytes from the X86 DAG isel matcher, ~300 from ARM, ~840 from Hexagon.
llvm-svn: 191940
2013-10-04 05:22:20 +00:00
Jin-Gu Kang 0bf8241d4b Added checking code whehter target supports specific dag combining about rotate
or not. The corresponding dag patterns are as following:

"DAGCombier::MatchRotate" function in DAGCombiner.cpp
Pattern1
// fold (or (shl (*ext x), (*ext y)),
//          (srl (*ext x), (*ext (sub 32, y)))) ->
//   (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext y)),
//          (srl (*ext x), (*ext (sub 32, y)))) ->
//   (*ext (rotr x, (sub 32, y)))

pattern2
// fold (or (shl (*ext x), (*ext (sub 32, y))),
//          (srl (*ext x), (*ext y))) ->
//   (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext (sub 32, y))),
//          (srl (*ext x), (*ext y))) ->
//   (*ext (rotr x, (sub 32, y)))

llvm-svn: 191905
2013-10-03 15:58:48 +00:00
Rafael Espindola 44fee4e0eb Remove several unused variables.
Patch by Alp Toker.

llvm-svn: 191757
2013-10-01 13:32:03 +00:00
Tom Stellard 6aada32dc4 SelectionDAG: Clarify comments from r191600
llvm-svn: 191724
2013-10-01 02:09:00 +00:00
Benjamin Kramer c3c807b3bf Allocate AtomicSDNode operands in SelectionDAG's allocator to stop leakage.
SDNode destructors are never called. As an optimization use AtomicSDNode's
internal storage if we have a small number of operands.

llvm-svn: 191636
2013-09-29 11:18:56 +00:00
Robert Wilhelm f0cfb83bb4 Fix spelling intruction -> instruction.
llvm-svn: 191610
2013-09-28 11:46:15 +00:00
Tom Stellard 45015d9796 SelectionDAG: Silence unused variable warning on release builds
llvm-svn: 191604
2013-09-28 03:10:17 +00:00
Tom Stellard 5694d3090a SelectionDAG: Improve legalization of SELECT_CC with illegal condition codes
SelectionDAG will now attempt to inverse an illegal conditon in order to
find a legal one and if that doesn't work, it will attempt to swap the
operands using the inverted condition.

There are no new test cases for this, but a nubmer of the existing R600
tests hit this path.

llvm-svn: 191602
2013-09-28 02:50:43 +00:00
Tom Stellard cd42818d86 SelectionDAG: Try to expand all condition codes using getCCSwappedOperands()
This is useful for targets like R600, which only support GT, GE, NE, and EQ
condition codes as it removes the need to handle unsupported condition
codes in target specific code.

There are no tests with this commit, but R600 has been updated to take
advantage of this new feature, so its existing selectcc tests are now
testing the swapped operands path.

llvm-svn: 191601
2013-09-28 02:50:38 +00:00
Tom Stellard 08690a146f SelectionDAG: Clean up LegalizeSetCCCondCode() function
Interpreting the results of this function is not very intuitive, so I
cleaned it up to make it more clear whether or not a SETCC op was
legalized and how it was legalized (either by swapping LHS and RHS or
replacing with AND/OR).

This patch does change functionality in the LHS and RHS swapping case,
but unfortunately there are no in-tree tests for this.  However, this
patch is a prerequisite for R600 to take advantage of the LHS and RHS
swapping, so tests will be added in subsequent commits.

llvm-svn: 191600
2013-09-28 02:50:32 +00:00
Andrea Di Biagio 56ce9c4e78 Re-apply the change from r191393 with fix for pr17380.
This change fixes the problem reported in pr17380 and re-add the dagcombine 
transformation ensuring that the value types are always legal if the 
transformation is triggered after Legalization took place.

Added the test case from pr17380.

llvm-svn: 191509
2013-09-27 11:37:05 +00:00
Andrea Di Biagio 549d6605a0 Revert r191393 since it caused pr17380.
llvm-svn: 191438
2013-09-26 16:54:01 +00:00
Amara Emerson b4ad2f396a [ARM] Use the load-acquire/store-release instructions optimally in AArch32.
Patch by Artyom Skrobov.

llvm-svn: 191428
2013-09-26 12:22:36 +00:00
Andrew Trick 71e8bb6d1d Added temp flag -misched-bench for staging in default changes.
llvm-svn: 191423
2013-09-26 05:53:35 +00:00
Andrew Trick 6f5aad7a24 whitespace
llvm-svn: 191422
2013-09-26 05:53:31 +00:00
Andrea Di Biagio 9f3313109f Teach DAGCombiner how to canonicalize dags according to the rule
(shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)).

The rule only triggers when there are no other uses of the
zext to avoid materializing more instructions.

This helps the DAGCombiner understand that the shl/shr
sequence can then be converted into an and instruction.

llvm-svn: 191393
2013-09-25 19:01:01 +00:00
Eli Friedman a961d694e2 Add missing check to SETCC optimization.
PR17338.

llvm-svn: 191337
2013-09-24 22:50:14 +00:00
Benjamin Kramer 64bdb29a83 DAGCombiner: Unify rotate matching for extended and unextended amounts.
No functionality change, lots of indentation changes.

llvm-svn: 191303
2013-09-24 14:21:28 +00:00
Jiangning Liu 63dc840fc5 Initial support for Neon scalar instructions.
Patch by Ana Pazos.

1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.

llvm-svn: 191263
2013-09-24 02:47:27 +00:00
Michael Gottesman 5e3600c1ce [stackprotector] Allow for copies from vreg -> vreg to be in a terminator sequence.
Sometimes a copy from a vreg -> vreg sneaks into the middle of a terminator
sequence. It is safe to slice this into the stack protector success bb.

This fixes PR16979.

llvm-svn: 191260
2013-09-24 01:50:26 +00:00
Kay Tiong Khoo 9195a5b081 fix typo: than -> then
llvm-svn: 191214
2013-09-23 18:43:51 +00:00
Tim Northover 31d093c705 ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.

Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.

This should fix PR15840.

llvm-svn: 191165
2013-09-22 08:21:56 +00:00
Juergen Ributzka f043a65327 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
This reverts commit r191130.

llvm-svn: 191138
2013-09-21 15:09:46 +00:00
Juergen Ributzka e9a80fc912 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191130
2013-09-21 04:55:18 +00:00
David Blaikie 9d117ab7ef Add braces to suppress Clang's dangling-else warning.
These violations were introduced in r191049

llvm-svn: 191059
2013-09-20 00:33:11 +00:00
Kai Nacke d09bb4614b PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191049
2013-09-19 23:00:28 +00:00
Kai Nacke 2d967b2751 Revert PR16726: extend rol/ror matching
There is a buildbot failure. Need to investigate this.

llvm-svn: 191048
2013-09-19 22:53:36 +00:00
Kai Nacke 4eaf6444fa PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191045
2013-09-19 22:36:39 +00:00
Benjamin Kramer d443e4a080 DAGCombiner: Don't fold vector muls with constants that look like a splat of a power of 2 but differ in bit width.
PR17283.

llvm-svn: 191000
2013-09-19 13:28:20 +00:00
Adrian Prantl 262bcf4584 Debug info: Get rid of the VLA indirection hack in FastISel.
Use the DIVariable::isIndirect() flag set by the frontend instead of
guessing whether to set the machine location's indirection bit.
Paired commit with CFE.

llvm-svn: 190961
2013-09-18 22:08:59 +00:00
Serge Pavlov 8ec39992c1 Added documentation to getMemsetStores.
llvm-svn: 190866
2013-09-17 16:24:42 +00:00
Quentin Colombet d30a9585b8 [SelectionDAG] Teach the vector scalarizer about TRUNCATE.
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.

<rdar://problem/14989896>

llvm-svn: 190830
2013-09-17 00:26:56 +00:00
Adrian Prantl db3e26d193 Debug info: Fix PR16736 and rdar://problem/14990587.
A DBG_VALUE is register-indirect iff the first operand is a register
_and_ the second operand is an immediate.

llvm-svn: 190821
2013-09-16 23:29:03 +00:00
Hal Finkel 31658834e6 Prevent assert in CombinerGlobalAA with null values
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).

llvm-svn: 190763
2013-09-15 02:19:49 +00:00
Matt Arsenault bc08ddba58 Remove pointless assertion after r190376
llvm-svn: 190565
2013-09-12 01:07:49 +00:00
Benjamin Kramer 079b96e6f7 Revert "Give internal classes hidden visibility."
It works with clang, but GCC has different rules so we can't make all of those
hidden. This reverts commit r190534.

llvm-svn: 190536
2013-09-11 18:05:11 +00:00
Benjamin Kramer 6a44af3629 Give internal classes hidden visibility.
Worth 100k on a linux/x86_64 Release+Asserts clang.

llvm-svn: 190534
2013-09-11 17:42:27 +00:00
Eli Friedman 8f06d55697 Rename variables for consistency.
No functional change.

llvm-svn: 190466
2013-09-11 00:41:02 +00:00
Eli Friedman 78bffa5767 Fix unused variables.
llvm-svn: 190448
2013-09-10 23:18:14 +00:00
Matt Arsenault d232222f34 Don't use getSetCCResultType for creating a vselect
The vselect mask isn't a setcc.

This breaks in the case when the result of getSetCCResultType
is larger than the vector operands

e.g. %tmp = select i1 %cmp <2 x i8> %a, <2 x i8> %b
when getSetCCResultType returns <2 x i32>, the assertion
that the (MaskTy.getSizeInBits() == Op1.getValueType().getSizeInBits())
is hit.

No test since I don't think I can hit this with any of the current
targets. The R600/SI implementation would break, since it returns a
vector of i1 for this, but it doesn't reach ExpandSELECT for other
reasons.

llvm-svn: 190376
2013-09-10 00:41:56 +00:00
Jack Carter 170a5f2983 white spaces and long lines
llvm-svn: 190358
2013-09-09 22:02:08 +00:00
Bob Wilson e407736a06 Revert patches to add case-range support for PR1255.
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were.  I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.

This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736

llvm-svn: 190328
2013-09-09 19:14:35 +00:00
Tim Northover 950fcc0577 SelectionDAG: create correct BooleanContent constants
Occasionally DAGCombiner can spot that a SETCC operation is completely
redundant and reduce it to "all true" or "all false". If this happens to a
vector, the value produced has to take account of what a normal comparison
would have produced, which may be an all-1s bitmask.

The fix in SelectionDAG.cpp is tested, however, as far as I can see the code in
TargetLowering.cpp is possibly unreachable and almost certainly irrelevant when
triggered so there are no tests. However, I believe it's still clearly the
right change and may save someone else some hassle if it suddenly becomes
reachable. So I'm doing it anyway.

llvm-svn: 190147
2013-09-06 12:38:12 +00:00
Hal Finkel 5ef4dccdce Use TargetSubtargetInfo::useAA() in DAGCombine
This uses the TargetSubtargetInfo::useAA() function to control the defaults of
the -combiner-alias-analysis and -combiner-global-alias-analysis options.

llvm-svn: 189564
2013-08-29 03:29:55 +00:00
Juergen Ributzka 11c52c601a Fix a typo and coding style of a previous commit. No functional change.
llvm-svn: 189526
2013-08-28 22:33:58 +00:00
Tim Northover 819bfb5a25 DAGCombiner: make sure or/shl/srl really has zero high bits before forming bswap
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.

llvm-svn: 189348
2013-08-27 13:46:45 +00:00
Owen Anderson a0260f848d Remove an over-zealous assertion. A pointer type could be illegal if the target is prepared to custom-legalize pointer operands. This assertion was evaluated before the target would have a chance to do so, making it impossible.
llvm-svn: 189299
2013-08-27 00:28:23 +00:00
Tom Stellard 838e2344ec SelectionDAG: Remove unnecessary uses of TargetLowering::getPointerTy()
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().

Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());

llvm-svn: 189227
2013-08-26 15:06:10 +00:00
Tom Stellard 7da047c9fb SelectionDAG: Use correct pointer size when splitting vector stores
llvm-svn: 189224
2013-08-26 15:05:55 +00:00
Tom Stellard fd155828ed SelectionDAG: Use correct pointer size when lowering function arguments v2
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes.  The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.

This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.

v2:
  - Add more helper functions to TargetLoweringBase
  - Use CHECK-LABEL for tests

llvm-svn: 189221
2013-08-26 15:05:36 +00:00
Benjamin Kramer b12cf01908 Add a function object to compare the first or second component of a std::pair.
Replace instances of this scattered around the code base.

llvm-svn: 189169
2013-08-24 12:54:27 +00:00
Michael Gottesman 20f25eb958 [stack protector] Work around an issue with the BMOVPCB_CALL instruction on ARM by disabling does not return on __stack_chk_fail.
This is to fix the bots while I look to see if there is something I can do here.

rdar://14811848

llvm-svn: 189076
2013-08-22 23:45:24 +00:00
Michael Gottesman 1adac3582d [stackprotector] When finding the split point to splice off the end of a parentmbb into a successmbb, include any DBG_VALUE MI.
Fix for PR16954.

llvm-svn: 188987
2013-08-22 05:40:50 +00:00
Tom Stellard 1b2c2d8414 SelectionDAG: Make sure stores are always added to the LegalizedNodes list
When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list.  Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.

This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.

llvm-svn: 188953
2013-08-21 22:42:58 +00:00
Juergen Ributzka 3db39dc1ae Teach BaseIndexOffset::match to identify base pointers in loops.
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.

This commit fixes the issue by adding an additional pattern match
and also a test case.

Reviewer: Nadav
llvm-svn: 188936
2013-08-21 21:53:38 +00:00
Richard Sandiford 6f6d55161b [SystemZ] Use SRST to optimize memchr
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779
2013-08-20 09:38:48 +00:00
Michael Gottesman f7e1203d95 Remove unused variables that crept in.
llvm-svn: 188761
2013-08-20 07:17:27 +00:00
Michael Gottesman b27f0f1f6b Teach selectiondag how to handle the stackprotectorcheck intrinsic.
Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).

Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!

A few goals in solving this problem were:

  1. Preserve the architecture independence of stack protector generation.

  2. Preserve the normal IR level stack protector check for platforms like
     OpenBSD for which we support platform specific stack protector
     generation.

The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:

  1. The decision on whether or not to perform a sibling call on certain
     platforms (for instance i386) requires lower level information
     related to available registers that can not be known at the IR level.

  2. Even if the previous point were not true, the decision on whether to
     perform a tail call is done in LowerCallTo in SelectionDAG which
     occurs after the Stack Protector Pass. As a result, one would need to
     put the relevant callinst into the stack protector check success
     basic block (where the return inst is placed) and then move it back
     later at SelectionDAG/MI time before the stack protector check if the
     tail call optimization failed. The MI level option was nixed
     immediately since it would require platform specific pattern
     matching. The SelectionDAG level option was nixed because
     SelectionDAG only processes one IR level basic block at a time
     implying one could not create a DAG Combine to move the callinst.

To get around this problem a few things were realized:

  1. While one can not handle multiple IR level basic blocks at the
     SelectionDAG Level, one can generate multiple machine basic blocks
     for one IR level basic block. This is how we handle bit tests and
     switches.

  2. At the MI level, tail calls are represented via a special return
     MIInst called "tcreturn". Thus if we know the basic block in which we
     wish to insert the stack protector check, we get the correct behavior
     by always inserting the stack protector check right before the return
     statement. This is a "magical transformation" since no matter where
     the stack protector check intrinsic is, we always insert the stack
     protector check code at the end of the BB.

Given the aforementioned constraints, the following solution was devised:

  1. On platforms that do not support SelectionDAG stack protector check
     generation, allow for the normal IR level stack protector check
     generation to continue.

  2. On platforms that do support SelectionDAG stack protector check
     generation:

    a. Use the IR level stack protector pass to decide if a stack
       protector is required/which BB we insert the stack protector check
       in by reusing the logic already therein. If we wish to generate a
       stack protector check in a basic block, we place a special IR
       intrinsic called llvm.stackprotectorcheck right before the BB's
       returninst or if there is a callinst that could potentially be
       sibling call optimized, before the call inst.

    b. Then when a BB with said intrinsic is processed, we codegen the BB
       normally via SelectBasicBlock. In said process, when we visit the
       stack protector check, we do not actually emit anything into the
       BB. Instead, we just initialize the stack protector descriptor
       class (which involves stashing information/creating the success
       mbbb and the failure mbb if we have not created one for this
       function yet) and export the guard variable that we are going to
       compare.

    c. After we finish selecting the basic block, in FinishBasicBlock if
       the StackProtectorDescriptor attached to the SelectionDAGBuilder is
       initialized, we first find a splice point in the parent basic block
       before the terminator and then splice the terminator of said basic
       block into the success basic block. Then we code-gen a new tail for
       the parent basic block consisting of the two loads, the comparison,
       and finally two branches to the success/failure basic blocks. We
       conclude by code-gening the failure basic block if we have not
       code-gened it already (all stack protector checks we generate in
       the same function, use the same failure basic block).

llvm-svn: 188755
2013-08-20 07:00:16 +00:00
Hal Finkel 0c5c01aa4a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Paul Redmond 62f840f46a Improve the widening of integral binary vector operations
- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap
  - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations
    that can trap
  - WidenVecRes_Binary simply widens the operation and improves codegen for
    3-element vectors by allowing widening and promotion on x86 (matches the
    behaviour of unary and ternary operation widening)
- use WidenVecRes_Binary for operations on integers.

Reviewed by: nrotem

llvm-svn: 188699
2013-08-19 20:01:35 +00:00
Hal Finkel e4eb78188c Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions
We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node
because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the
sum of two doubles, and the first double must have the larger magnitude, we
can take the sign from the first double. As a result, in addition to fixing the
crash, this is also an optimization.

llvm-svn: 188655
2013-08-19 06:55:37 +00:00
Jim Grosbach 06c2a68125 ARM: Fix more fast-isel verifier failures.
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.

rdar://12594152

llvm-svn: 188593
2013-08-16 23:37:31 +00:00
Richard Sandiford 0dec06a28c [SystemZ] Use SRST to implement strlen and strnlen
It would also make sense to use it for memchr; I'm working on that now.

llvm-svn: 188547
2013-08-16 11:41:43 +00:00
Richard Sandiford bb83a50f57 [SystemZ] Use MVST to implement strcpy and stpcpy
llvm-svn: 188546
2013-08-16 11:29:37 +00:00
Richard Sandiford ca23271010 [SystemZ] Use CLST to implement strcmp
llvm-svn: 188544
2013-08-16 11:21:54 +00:00
Richard Sandiford e3827751e2 [SystemZ] Fix handling of 64-bit memcmp results
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did.  I've split this out into a
subroutine so that it can be used for other upcoming patches.

I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads.  I don't have a testcase for that because
we don't do any interesting scheduling on z yet.

llvm-svn: 188540
2013-08-16 10:55:47 +00:00
Craig Topper d9c2783d8f Replace getValueType().getSimpleVT() with getSimpleValueType().
llvm-svn: 188442
2013-08-15 02:44:19 +00:00
Jim Grosbach 327ccc787e DAG: Combine (and (setne X, 0), (setne X, -1)) -> (setuge (add X, 1), 2)
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
  testl %edi, %edi
  setne %al
  cmpl  $-1, %edi
  setne %cl
  andb  %al, %cl

With this transform, we generate the simpler:
  incl  %edi
  cmpl  $1, %edi
  seta  %al

Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.

rdar://14689217

llvm-svn: 188315
2013-08-13 21:30:58 +00:00
Michael Gottesman 7a8017290a Update makeLibCall to return both the call and the chain associated with the libcall instead of just the call. This allows us to specify libcalls that return void.
LowerCallTo returns a pair with the return value of the call as the first
element and the chain associated with the return value as the second element. If
we lower a call that has a void return value, LowerCallTo returns an SDValue
with a NULL SDNode and the chain for the call. Thus makeLibCall by just
returning the first value makes it impossible for you to set up the chain so
that the call is not eliminated as dead code.

I also updated all references to makeLibCall to reflect the new return type.

llvm-svn: 188300
2013-08-13 17:54:56 +00:00
Michael Gottesman 3923bec37b Fixed SelectionDAGBuilder.h C++ filetype declaration to use the canonical C++ instead of c++.
llvm-svn: 188203
2013-08-12 21:02:02 +00:00
Richard Sandiford 564681c88d [SystemZ] Use CLC and IPM to implement memcmp
For now this is restricted to fixed-length comparisons with a length
in the range [1, 256], as for memcpy() and MVC.

llvm-svn: 188163
2013-08-12 10:28:10 +00:00
Craig Topper 0ecb26a79e Change asserts at the top of getVectorShuffle to check that LHS and RHS have the same type as the result.
Previously the asserts were only checking that RHS and LHS were the same type and had the same element type as the result. All downstream code for ISD::VECTOR_SHUFFLE requires the types to be the same.

Also removed one unnecessary check of matched element counts that was present in the code.

llvm-svn: 188051
2013-08-09 04:37:24 +00:00
Craig Topper 9a39b07a60 Remove AllUndef check from one of the loops in getVectorShuffle. It was already handled by the 'AllLHS && AllRHS' check after the previous loop.
llvm-svn: 187965
2013-08-08 08:03:12 +00:00
Craig Topper 309dfefb6f Optimize mask generation for one of the DAG combiner shufflevector cases.
llvm-svn: 187961
2013-08-08 07:38:55 +00:00
Hal Finkel 171817ee8a Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.

For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).

This will be used by the PowerPC backend in a follow-up commit.

llvm-svn: 187926
2013-08-07 22:49:12 +00:00
Tom Stellard d42c594960 TargetLowering: Add getVectorIdxTy() function v2
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR.  The default implementation returns
the result from TargetLowering::getPointerTy()

The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets.  However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces.  On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).

There is no intended functionality change with this patch.

llvm-svn: 187748
2013-08-05 22:22:01 +00:00
Eric Christopher e6656ac870 Fix crashing on invalid inline asm with matching constraints.
For a testcase like the following:

 typedef unsigned long uint64_t;

 typedef struct {
   uint64_t lo;
   uint64_t hi;
 } blob128_t;

 void add_128_to_128(const blob128_t *in, blob128_t *res) {
   asm ("PAND %1, %0" : "+Q"(*res) : "Q"(*in));
 }

where we'll fail to allocate the register for the output constraint,
our matching input constraint will not find a register to match,
and could try to search past the end of the current operands array.

On the idea that we'd like to attempt to keep compilation going
to find more errors in the module, change the error cases when
we're visiting inline asm IR to return immediately and avoid
trying to create a node in the DAG. This leaves us with only
a single error message per inline asm instruction, but allows us
to safely keep going in the general case.

llvm-svn: 187470
2013-07-31 01:26:24 +00:00
Eric Christopher 029af15086 Reflow this to be easier to read.
llvm-svn: 187459
2013-07-30 22:50:44 +00:00
Quentin Colombet 6bf4baa408 [DAGCombiner] insert_vector_elt: Avoid building a vector twice.
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN 

The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
  vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
  optimizations.

llvm-svn: 187396
2013-07-30 00:24:09 +00:00
Nick Lewycky 0b68245ec8 Reimplement isPotentiallyReachable to make nocapture deduction much stronger.
Adds unit tests for it too.

Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.

llvm-svn: 187283
2013-07-27 01:24:00 +00:00
Justin Holewinski d3f2035a3c Add a target legalize hook for SplitVectorOperand (again)
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

Attempt to fix the buildbots by making the X86 test I just added platform independent

llvm-svn: 187202
2013-07-26 13:28:29 +00:00
Rafael Espindola 1d812728cc Revert "Add a target legalize hook for SplitVectorOperand"
This reverts commit 187198. It broke the bots.

The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".

llvm-svn: 187201
2013-07-26 13:18:16 +00:00
Justin Holewinski f848a24e50 Add a target legalize hook for SplitVectorOperand
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

llvm-svn: 187198
2013-07-26 12:46:39 +00:00
Tom Stellard c54731aa9d DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)Free
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.

llvm-svn: 187007
2013-07-23 23:55:03 +00:00
Michael Gottesman f87a6ae65f Add -*- C++ -*- to InstrEmitter.h.
llvm-svn: 186527
2013-07-17 18:53:29 +00:00
Hal Finkel 2f5e8e3d95 Remove invalid assert in DAGTypeLegalizer::RemapValue
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:

  // Note that these invariants may not hold momentarily when processing a node:
  // the node being processed may be put in a map before being marked Processed.

Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.

Thanks to Richard Sandiford for investigating this issue!

Fixes PR16562.

llvm-svn: 186338
2013-07-15 18:57:05 +00:00
Craig Topper 06b3b6651e Add 'const' qualifier to some arrays.
llvm-svn: 186312
2013-07-15 08:02:13 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Craig Topper e0b711864c Pass SmallVector by const reference instead of by value.
llvm-svn: 186243
2013-07-13 07:43:40 +00:00
Stephen Lin 10947502e5 Remove trailing whitespac
llvm-svn: 186032
2013-07-10 20:47:39 +00:00
Adrian Prantl a1ffd1a450 Un-break the buildbot by tweaking the indirection flag.
Pulled in a testcase from the debuginfo-test suite.

llvm-svn: 185993
2013-07-10 01:53:37 +00:00
Adrian Prantl facc9f4e3e Document a known limitation of the status quo.
llvm-svn: 185992
2013-07-10 01:53:30 +00:00
Adrian Prantl 418d1d1ea9 Reapply an improved version of r180816/180817.
Change the informal convention of DBG_VALUE machine instructions so that
we can express a register-indirect address with an offset of 0.
The old convention was that a DBG_VALUE is a register-indirect value if
the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE
is register-indirect if the first operand is a register and the second
operand is an immediate. For plain register values the combination reg,
reg is used. MachineInstrBuilder::BuildMI knows how to build the new
DBG_VALUES.

rdar://problem/13658587

llvm-svn: 185966
2013-07-09 20:28:37 +00:00
Hal Finkel e4dd5c29f0 WidenVecRes_BUILD_VECTOR must use the first operand's type
Because integer BUILD_VECTOR operands may have a larger type than the result's
vector element type, and all operands must have the same type, when widening a
BUILD_VECTOR node by adding UNDEFs, we cannot use the vector element type, but
rather must use the type of the existing operands.

Another bug found by llvm-stress.

llvm-svn: 185960
2013-07-09 18:55:10 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Hal Finkel 6c29bd9088 DAGCombine tryFoldToZero cannot create illegal types after type legalization
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.

This was another bug found by llvm-stress.

llvm-svn: 185949
2013-07-09 17:02:45 +00:00
Stephen Lin 8e8424eb17 Style fixes: remove unnecessary braces for one-statement if blocks, no else after return, etc. No funcionality change.
llvm-svn: 185893
2013-07-09 00:44:49 +00:00
Hal Finkel 12493bb7d5 Improve the comment from r185794 (re: PromoteIntRes_BUILD_VECTOR)
In response to Duncan's review, I believe that the original comment was not as
clear as it could be. Hopefully, this is better.

llvm-svn: 185824
2013-07-08 14:40:04 +00:00
Hal Finkel 8cb9a0e1d3 Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors
This fixes a bug (found by llvm-stress) in
DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
type would always be larger than the original operands. This is not always
true, however, with boolean vectors. For example, promoting a node of type v8i1
(where the operands will be of type i32, the type to which i1 is promoted) will
yield a node with a result vector element type of i16 (and operands of type
i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
to the result type.

llvm-svn: 185794
2013-07-08 06:16:58 +00:00
Stephen Lin cfe7f352c7 Remove trailing whitespace from SelectionDAG/*.cpp
llvm-svn: 185780
2013-07-08 00:37:03 +00:00
Stephen Lin 6d715e8699 SelectionDAGBuilder: style fixes (add space between end parentheses and open brace)
llvm-svn: 185768
2013-07-06 21:44:25 +00:00
Benjamin Kramer c7332b2796 DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe.
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.

Fixes PR16551.

llvm-svn: 185755
2013-07-06 14:05:09 +00:00
Tim Northover dab4db5372 Stop putting operations after a tail call.
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).

llvm-svn: 185754
2013-07-06 12:58:45 +00:00
Jakob Stoklund Olesen db429d9483 Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185625
2013-07-04 13:54:20 +00:00
Jakob Stoklund Olesen 6a7d68349f Typo.
llvm-svn: 185618
2013-07-04 04:53:49 +00:00
Jakob Stoklund Olesen fee2a20209 Simplify landing pad lowering.
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be  scheduled at the top of the
basic block.

This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.

This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.

A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.

llvm-svn: 185617
2013-07-04 04:53:45 +00:00
Jakob Stoklund Olesen 3d8560c382 FastISel can only apend to basic blocks.
Compute the insertion point from the end of the basic block instead of
skipping labels from the front.

This caused failures in landing pads when live-in copies where inserted
before instruction selection.

llvm-svn: 185616
2013-07-04 04:32:39 +00:00
Jakob Stoklund Olesen a1f5b901a5 Revert r185595-185596 which broke buildbots.
Revert "Simplify landing pad lowering."
Revert "Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes."

llvm-svn: 185600
2013-07-04 00:26:30 +00:00
Jakob Stoklund Olesen f33ec531fa Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185596
2013-07-03 23:56:31 +00:00
Jakob Stoklund Olesen fa6a7b9b02 Simplify landing pad lowering.
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be  scheduled at the top of the
basic block.

This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.

This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.

A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.

llvm-svn: 185595
2013-07-03 23:56:24 +00:00
Craig Topper af0ad9e20f Use SmallVectorImpl::const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185514
2013-07-03 05:18:47 +00:00
Craig Topper e1c1d363a5 Use SmallVectorImpl instead of SmallVector for iterators and references to avoid specifying the vector size unnecessarily.
llvm-svn: 185512
2013-07-03 05:11:49 +00:00
Tim Northover 6823900e55 DAGCombiner: fix use-counting issue when forming zextload
DAGCombiner was counting all uses of a load node  when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.

rdar://problem/13896307

llvm-svn: 185419
2013-07-02 09:58:53 +00:00
Michael Gottesman fd62bb9d3e Added c++ mode selector to head of SelectionDAGBuilder.h so editors open it in c++ mode instead of c mode.
llvm-svn: 185348
2013-07-01 16:53:41 +00:00
Lang Hames c22e39d83d Add missing case to switch statement - DAGTypeLegalizer::ExpandIntegerResult
should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP.

Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to
ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash
during isel.

<rdar://problem/14074644>

llvm-svn: 185186
2013-06-28 18:36:42 +00:00
Manman Ren 983a16c08a Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.

Also update testing cases to make them conform to the format of DI classes.

llvm-svn: 185135
2013-06-28 05:43:10 +00:00
Elena Demikhovsky fed077be03 Fixed a comment.
llvm-svn: 184933
2013-06-26 12:15:53 +00:00
Elena Demikhovsky 6769c50d9e Optimized integer vector multiplication operation by replacing it with shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector.
llvm-svn: 184931
2013-06-26 10:55:03 +00:00
Chad Rosier 295bd43adb The getRegForInlineAsmConstraint function should only accept MVT value types.
llvm-svn: 184642
2013-06-22 18:37:38 +00:00
David Blaikie 97c6c5bd98 DebugInfo: Don't lose unreferenced non-trivial by-value parameters
A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.

Reviewed/discussion by Evan Cheng & Dan Gohman.

llvm-svn: 184604
2013-06-21 22:56:30 +00:00
Michael Liao 62ebfd8786 Fix PR16360
When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.

llvm-svn: 184575
2013-06-21 18:45:27 +00:00
Bill Wendling a3cd350249 Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change.
llvm-svn: 184360
2013-06-19 21:36:55 +00:00
Bill Wendling 0ccf31007f Don't cache the TLI object since we have access to it through TargetMachine already.
llvm-svn: 184346
2013-06-19 20:32:16 +00:00
Quentin Colombet b51a68681a During SelectionDAG building explicitly set a node to constant zero when the
value is zero.
This allows optmizations to kick in more easily.
Fix some test cases so that they remain meaningful (i.e., not completely dead
coded) when optimizations apply.

<rdar://problem/14096009> superfluous multiply by high part of zero-extended
value.

llvm-svn: 184222
2013-06-18 20:14:39 +00:00
David Blaikie 0252265be0 Debug Info: Simplify Frame Index handling in DBG_VALUE Machine Instructions
Rather than using the full power of target-specific addressing modes in
DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This
reduces the complexity of debug info handling down to two
representations of values (reg+offset and frame index+offset) rather
than three or four.

Ideally we could ensure that frame indicies had been eliminated by the
time we reached an assembly or dwarf generation, but I haven't spent the
time to figure out where the FIs are leaking through into that & whether
there's a good place to convert them. Some FI+offset=>reg+offset
conversion is done (see PrologEpilogInserter, for example) which is
necessary for some SelectionDAG assumptions about registers, I believe,
but it might be possible to make this a more thorough conversion &
ensure there are no remaining FIs no matter how instruction selection
is performed.

llvm-svn: 184066
2013-06-16 20:34:15 +00:00
Stephen Lin 605207fe75 SelectionDAG: slightly refactor DAGCombiner::visitSELECT_CC to avoid redudant checks...
This doesn't really effect performance due to all the relevant calls being transparent but is clearer. 

llvm-svn: 184027
2013-06-15 04:03:33 +00:00
Matt Arsenault d2f0332a29 Introduce getSelect usage and use more getSelectCC
llvm-svn: 184012
2013-06-14 22:04:37 +00:00
Stephen Lin 4e69d01b67 SelectionDAG: minor fix to order of operands in comments to match the code
llvm-svn: 184008
2013-06-14 21:33:58 +00:00
Stephen Lin e31f2d2d54 SelectionDAG: Fix incorrect condition checks in some cases of folding FADD/FMUL combinations; also improve accuracy of comments
llvm-svn: 183993
2013-06-14 18:17:35 +00:00
David Majnemer 0fc8670cb0 TargetLowering: Clean up method description comments
llvm-svn: 183623
2013-06-08 23:51:45 +00:00
Bill Wendling f77190855d Cache the TargetLowering info object as a pointer.
Caching it as a pointer allows us to reset it if the TargetMachine object
changes.

llvm-svn: 183361
2013-06-06 00:43:09 +00:00
Bill Wendling 8db01cb262 Don't cache the TargetLoweringInfo object inside of the FunctionLowering object.
The TargetLoweringInfo object is owned by the TargetMachine. In the future, the
TargetMachine object may change, which may also change the TargetLoweringInfo
object.

llvm-svn: 183356
2013-06-06 00:11:39 +00:00
Andrew Trick ad6d08ac6f Order CALLSEQ_START and CALLSEQ_END nodes.
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.

Patch by Xiaoyi Guo!

This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.

llvm-svn: 182885
2013-05-29 22:03:55 +00:00
Benjamin Kramer 262b154247 Simplify code. No functionality change.
llvm-svn: 182779
2013-05-28 16:39:36 +00:00
Benjamin Kramer 351d53c225 Remove double semicolons.
llvm-svn: 182778
2013-05-28 16:31:26 +00:00
Preston Gurd 048f99de11 Convert sqrt functions into sqrt instructions when -ffast-math is in effect.
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite". 

This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.

llvm-svn: 182739
2013-05-27 15:44:35 +00:00
Andrew Trick c66d26adf0 Fix PR16143: Insert DEBUG_VALUE before terminator.
llvm-svn: 182717
2013-05-26 08:58:50 +00:00
Andrew Trick e2431c64bc Track IR ordering of SelectionDAG nodes 3/4.
Remove the old IR ordering mechanism and switch to new one.  Fix unit
test failures.

llvm-svn: 182704
2013-05-25 03:08:10 +00:00
Andrew Trick ef9de2a739 Track IR ordering of SelectionDAG nodes 2/4.
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.

llvm-svn: 182703
2013-05-25 02:42:55 +00:00
Andrew Trick 175143bf88 Track IR ordering of SelectionDAG nodes 1/4.
Use a field in the SelectionDAGNode object to track its IR ordering.
This adds fields and utility classes without changing existing
interfaces or functionality.

llvm-svn: 182701
2013-05-25 02:20:36 +00:00
Michael J. Spencer df1ecbd734 Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.
llvm-svn: 182680
2013-05-24 22:23:49 +00:00
Adrian Prantl 0d1e5592a6 Unify formatting of debug output.
llvm-svn: 182495
2013-05-22 18:02:19 +00:00
Justin Holewinski fff1f5f5e2 Drop @llvm.annotation and @llvm.ptr.annotation intrinsics during codegen.
The intrinsic calls are dropped, but the annotated value is propagated.

Fixes PR 15253

Original patch by Zeng Bin!

llvm-svn: 182387
2013-05-21 14:37:16 +00:00
Benjamin Kramer 8aaf197990 DAGCombine: Avoid an edge case where it tried to create an i0 type for (x & 0) == 0.
Fixes PR16083.

llvm-svn: 182357
2013-05-21 08:51:09 +00:00
Matt Arsenault 75865923c9 Add LLVMContext argument to getSetCCResultType
llvm-svn: 182180
2013-05-18 00:21:46 +00:00
Matt Arsenault 04126234e5 Replace redundant code
Use EVT::changeExtendedVectorElementTypeToInteger instead of doing the
same thing that it does

llvm-svn: 182165
2013-05-17 21:43:43 +00:00
Matt Arsenault 52ddb7bcdd Add missing -*- C++ -*- to headers
llvm-svn: 182164
2013-05-17 21:43:39 +00:00
Adrian Prantl 9c93059aa4 Generate debug info for by-value struct args even if they are not used.
radar://problem/13865940

llvm-svn: 182062
2013-05-16 23:44:12 +00:00
Benjamin Kramer fc88c3761f DAGCombine: Also shrink eq compares where the constant is exactly as large as the smaller type.
if ((x & 255) == 255)

before: movzbl  %al, %eax
        cmpl  $255, %eax

after:  cmpb  $-1, %al
llvm-svn: 182038
2013-05-16 18:47:58 +00:00
Hal Finkel 1f6a7f53d8 Fix legalization of SETCC with promoted integer intrinsics
If the input operands to SETCC are promoted, we need to make sure that we
either use the promoted form of both operands (or neither); a mixture is not
allowed. This can happen, for example, if a target has a custom promoted
i1-returning intrinsic (where i1 is not a legal type). In this case, we need to
use the promoted form of both operands.

This change only augments the behavior of the existing logic in the case where
the input types (which may or may not have already been legalized) disagree,
and should not affect existing target code because this case would otherwise
cause an assert in the SETCC operand promotion code.

This will be covered by (essentially all of the) tests for the new PPCCTRLoops
infrastructure.

llvm-svn: 181926
2013-05-15 21:37:27 +00:00
Bob Wilson c5c0823724 Remove redundant variable introduced by r181682.
llvm-svn: 181721
2013-05-13 19:02:31 +00:00
Hao Liu bc60196951 Fix PR15950 A bug in DAG Combiner about undef mask
llvm-svn: 181682
2013-05-13 02:07:05 +00:00
Benjamin Kramer a5d59333b3 DAGCombiner: Generate a correct constant for vector types when folding (xor (and)) into (and (not)).
PR15948.

llvm-svn: 181597
2013-05-10 14:09:52 +00:00
Owen Anderson 32baf99b1d Teach SelectionDAG to constant fold all-constant FMA nodes the same way that it constant folds FADD, FMUL, etc.
llvm-svn: 181555
2013-05-09 22:27:13 +00:00
David Majnemer 386ab7f872 DAGCombiner: Simplify inverted bit tests
Fold (xor (and x, y), y) -> (and (not x), y)

This removes an opportunity for a constant to appear twice.

llvm-svn: 181395
2013-05-08 06:44:42 +00:00
Matt Arsenault a5733dc97e Fix vselect when getSetCCResultType returns a different type from the operands
llvm-svn: 181348
2013-05-07 20:24:18 +00:00
Michael Kuperstein ac868757d0 Fix slightly too aggressive conact_vector optimization.
(Would sometimes optimize away conacts used to extend a vector with undef values)

llvm-svn: 181186
2013-05-06 08:06:13 +00:00
Dmitri Gribenko 3238fb7595 Add ArrayRef constructor from None, and do the cleanups that this constructor enables
Patch by Robert Wilhelm.

llvm-svn: 181138
2013-05-05 00:40:33 +00:00
Chad Rosier 8e4824f350 [inline asm] Return an undef SDValue of the expected value type, rather than
report a fatal error.  This allows us to continue processing the translation
unit.  Test case to come on the clang side because we need an inline asm
diagnostics handler in place.
rdar://13446483

llvm-svn: 180873
2013-05-01 19:49:26 +00:00
Nadav Rotem e5a2dda372 Optimize away nop CONCAT_VECTOR nodes.
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.

rdar://13402653
PR15866

llvm-svn: 180871
2013-05-01 19:18:51 +00:00
Stephen Lin 699808ceb2 Only pass 'returned' to target-specific lowering code when the value of entire register is guaranteed to be preserved.
llvm-svn: 180825
2013-04-30 22:49:28 +00:00
Adrian Prantl a2888e71eb Temporarily revert "Change the informal convention of DBG_VALUE so that we can express a"
because it breaks some buildbots.

This reverts commit 180816.

llvm-svn: 180819
2013-04-30 22:35:14 +00:00
Adrian Prantl 9a576644e4 Change the informal convention of DBG_VALUE so that we can express a
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.

rdar://problem/13658587

llvm-svn: 180816
2013-04-30 22:16:46 +00:00
Silviu Baranga af7e8c367f Re-write the address propagation code for pre-indexed loads/stores to take into account some previously misssed cases (PRE_DEC addressing mode, the offset and base address are swapped, etc). This should fix PR15581.
llvm-svn: 180609
2013-04-26 15:52:24 +00:00
Benjamin Kramer d56ffc709d DAGCombiner: Canonicalize vector integer abs in the same way we do it for scalars.
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).

llvm-svn: 180597
2013-04-26 09:19:19 +00:00
Silviu Baranga 4ad2bc5963 Fix constant folding for one lane vector types. Constant folding one lane vector types not returns a vector instead of a scalar.
llvm-svn: 180254
2013-04-25 09:32:33 +00:00
Chad Rosier 108d5a61b7 [inline asm] Fix a crasher for an invalid value type/register class.
rdar://13731657

llvm-svn: 180226
2013-04-24 22:53:10 +00:00
Owen Anderson 2d4cca35c3 DAGCombine should not aggressively fold SEXT(VSETCC(...)) into a wider VSETCC without first checking the target's vector boolean contents.
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents.  The included change
fixes the PowerPC tests, and was OK'd by Hal.

llvm-svn: 180129
2013-04-23 18:09:28 +00:00
Jim Grosbach 563983c8a3 Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989
2013-04-21 23:47:41 +00:00
Jim Grosbach d4db72db61 Tidy up comment grammar.
llvm-svn: 179986
2013-04-21 21:23:01 +00:00
Tim Northover a2b533906a Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.
llvm-svn: 179939
2013-04-20 12:32:17 +00:00
Stephen Lin b8bd232a3d Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).
llvm-svn: 179925
2013-04-20 05:14:40 +00:00
Eli Bendersky e80691dc0a Simplify the code in FastISel::tryToFoldLoad, add an assertion and fix a comment.
llvm-svn: 179908
2013-04-19 23:26:18 +00:00
Eli Bendersky 90dd3e7dfd Move TryToFoldFastISelLoad to FastISel, where it belongs. In general, I'm
trying to move as much FastISel logic as possible out of the main path in
SelectionDAGISel - intermixing them just adds confusion.

llvm-svn: 179902
2013-04-19 22:29:18 +00:00
Michael Liao b53d8963ce ArrayRefize getMachineNode(). No functionality change.
llvm-svn: 179901
2013-04-19 22:22:57 +00:00
Eli Bendersky dbeefaa86a Use dbgs() consistently for -debug printouts
llvm-svn: 179894
2013-04-19 21:37:07 +00:00
Eli Bendersky 6084f45f38 Add some more stats for fast isel vs. SelectionDAG, w.r.t lowering function
arguments in entry BBs.

llvm-svn: 179824
2013-04-19 01:04:40 +00:00
Benjamin Kramer bbae991db6 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Eli Bendersky fc186358f2 Formatting
llvm-svn: 178771
2013-04-04 18:03:41 +00:00
Bill Schmidt 92e26646bc Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Arnold Schwaighofer d6c6e868b2 DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Arnold Schwaighofer 6752366ed7 Merge load/store sequences with adresses: base + index + offset
We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483
2013-04-01 18:12:58 +00:00
Benjamin Kramer 9335443236 DAGCombine: visitXOR can replace a node without returning it, bail out in that case.
Fixes the crash reported in PR15608.

llvm-svn: 178429
2013-03-30 21:28:18 +00:00
Chad Rosier dbac025d84 [fast-isel] Add a preemptive fix for the case where we fail to materialize an
immediate in a register.  I don't believe this should ever fail, but I see no
harm in trying to make this code bullet proof.

I've added an assert to ensure my assumtion is correct.  If the assertion fires
something is wrong and we should fix it, rather then just silently fall back to
SelectionDAG isel.

llvm-svn: 178305
2013-03-28 23:04:47 +00:00
Michael Liao bb05a1d7b5 Enhance folding of (extract_subvec (insert_subvec V1, V2, IIdx), EIdx)
- Handle the case where the result of 'insert_subvect' is bitcasted
  before 'extract_subvec'. This removes the redundant insertf128/extractf128
  pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.

llvm-svn: 177945
2013-03-25 23:47:35 +00:00
Shuxin Yang 93b1f12ac1 Disable some unsafe-fp-math DAG-combine transformation after legalization.
For instance, following transformation will be disabled:
    x + x + x => 3.0f * x;

The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.

Reviewed by Nadav, thanks a lot!

rdar://13445387

llvm-svn: 177933
2013-03-25 22:52:29 +00:00
Owen Anderson c81616b0a9 Remove the type legality check from the SelectionDAGBuilder when it lowers @llvm.fmuladd to ISD::FMA nodes.
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.

NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook.  They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
llvm-svn: 177820
2013-03-23 08:26:53 +00:00
Justin Holewinski 7478f3d776 Make variable name more explicit and eliminate redundant lookup in SDNodeOrdering
llvm-svn: 177600
2013-03-20 23:10:59 +00:00
Nadav Rotem 4536d582fd When computing the demanded bits of Load SDNodes, make sure that we are looking at the loaded-value operand and not the ptr result (in case of pre-inc loads).
rdar://13348420

llvm-svn: 177596
2013-03-20 22:53:44 +00:00
Christian Konig ed34d0ef1a Revert "pre-RA-sched: fix TargetOpcode usage"
This reverts commit 06091513c283c863296f01cc7c2e86b56bb50d02.

The code is obviously wrong, but the trivial fix causes
inefficient code generation on X86. Somebody with more
knowledge of the code needs to take a look here.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 177529
2013-03-20 15:43:00 +00:00
Justin Holewinski c2d2c8939c Move SDNode order propagation to SDNodeOrdering, which also fixes a missed
case of order propagation during isel.

Thanks Owen for the suggestion!

llvm-svn: 177525
2013-03-20 14:51:01 +00:00
Christian Konig 9ce2d5b862 pre-RA-sched: fix TargetOpcode usage
TargetOpcodes need to be treaded as Machine- and not ISD-Opcodes.

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 177518
2013-03-20 13:49:22 +00:00
Justin Holewinski d068943809 Propagate DAG node ordering during type legalization and instruction selection
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.

llvm-svn: 177465
2013-03-20 00:10:32 +00:00
Bill Wendling 965bd58902 Reset some of the target options which affect code generation.
This doesn't reset all of the target options within the TargetOptions
object. This is because some of those are ABI-specific and must be determined if
it's okay to change those on the fly.

llvm-svn: 176986
2013-03-13 22:26:59 +00:00
Richard Relph 61046a9727 Avoid generating ISD::SELECT for vector operands to SIGN_EXTEND
llvm-svn: 176881
2013-03-12 18:17:18 +00:00
Nick Lewycky 48beb21185 Fix a crasher newly introduced in r176659/r176649, where fast-isel tries to
lower an expect intrinsic that is a constant expression.

llvm-svn: 176830
2013-03-11 21:44:37 +00:00
Jan Wen Voung 7857a64909 Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

llvm-svn: 176733
2013-03-08 22:56:31 +00:00
Benjamin Kramer 0f12a5c2a4 Remove default from fully covered switch.
llvm-svn: 176703
2013-03-08 17:03:19 +00:00
Tom Stellard d93ef7afaa LegalizeDAG: Respect the result of TLI.getBooleanContents() when expanding SETCC
llvm-svn: 176695
2013-03-08 15:37:02 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Bill Wendling 2d915e2c15 Revert r176154 in favor of a better approach.
Code generation makes some basic assumptions about the IR it's been given. In
particular, if there is only one 'invoke' in the function, then that invoke
won't be going away. However, with the advent of the `llvm.donothing' intrinsic,
those invokes may go away. If all of them go away, the landing pad no longer has
any users. This confuses the back-end, which asserts.

This happens with SjLj exceptions, because that's the model that modifies the IR
based on there being invokes, etc. in the function.

Remove any invokes of `llvm.donothing' during SjLj EH preparation. This will
give us a CFG that the back-end won't be confused about. If all of the invokes
in a function are removed, then the SjLj EH prepare pass won't insert the bogus
code the relies upon the invokes being there.
<rdar://problem/13228754&13316637>

llvm-svn: 176677
2013-03-08 02:21:08 +00:00
Chad Rosier 3a200e1faf [fast-isel] Seriously, add support for the expect intrinsic.
rdar://13370942

llvm-svn: 176659
2013-03-07 21:38:33 +00:00
Chad Rosier 9c1796f877 [fast-isel] Add support for the expect intrinsic.
rdar://13370942

llvm-svn: 176649
2013-03-07 20:42:17 +00:00
Benjamin Kramer fdf362bd69 ArrayRefize some code. No functionality change.
llvm-svn: 176648
2013-03-07 20:33:29 +00:00
Andrew Trick 0f23b763a9 pre-RA-sched debug-only fix
llvm-svn: 176638
2013-03-07 19:21:08 +00:00