Commit Graph

2993 Commits

Author SHA1 Message Date
Juergen Ributzka 3bd03c7099 [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

llvm-svn: 212135
2014-07-01 22:01:54 +00:00
Tim Northover df58625e3c X86: delegate expanding atomic libcalls to generic code.
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.

X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.

This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.

llvm-svn: 212134
2014-07-01 21:44:59 +00:00
Tim Northover 277066ab43 X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

llvm-svn: 212119
2014-07-01 18:53:31 +00:00
Andrea Di Biagio 53b6830069 [X86] Add support for builtin to read performance monitoring counters.
This patch adds support for a new builtin instruction called
__builtin_ia32_rdpmc.

Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can
be used to read performance monitoring counters. It takes as input the index
of the performance counter to read, and returns the value of the specified
performance counter as a 64-bit number.

Calls to this new builtin will map to instruction RDPMC.
The index in input to the builtin call is moved to register %ECX. The result
of the builtin call is the value of the specified performance counter (RDPMC
would return that quantity in registers RDX:RAX).

This patch:
 - Adds builtin int_x86_rdpmc as a GCCBuiltin;
 - Adds a new x86 DAG node called 'RDPMC_DAG';
 - Teaches how to lower this new builtin;
 - Adds an ISel pattern to select instruction RDPMC;
 - Fixes the definition of instruction RDPMC adding %RAX and %RDX as
   implicit definitions, and adding %ECX as implicit use;
 - Adds a LLVM test to verify that the new builtin is correctly selected.

llvm-svn: 212049
2014-06-30 17:14:21 +00:00
Chandler Carruth bd0717d7cc [x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like
lowering for v16i8.

ASan and some bots caught this bug with existing test cases. Fixing it
even fixed a miscompile with one of the test cases. I'm still a bit
suspicious of this test case as I've not taken a proper amount of time
to think about it, but the fix here is strict goodness.

llvm-svn: 211976
2014-06-28 05:46:28 +00:00
Chandler Carruth 887c2c3482 [x86] Add handling for splat-like widenings of v16i8 shuffles.
These show up really frequently, not the least with actual splats. =] We
lowered these quite badly before. The new code path tries to widen i8
shuffles to i16 shuffles in a splat-like way. There are still some
inefficiencies in our i16 splat logic though, so we aren't really done
here.

Also, for certain patterns (bit of a gather-and-splat) we still
generate pretty silly code, and I've left a fixme for addressing it.
However, I'm not actually worried about this code pattern as much. The
old shuffle lowering generates a 29 instruction monstrosity for it that
should execute much more slowly.

llvm-svn: 211974
2014-06-28 05:16:40 +00:00
Chandler Carruth a94ef908d9 [x86] Fix another bug hit when bootstrapping with the new shuffle
lowering.

For maximum irony, I had already discovered this bug, diagnosed it, and
left FIXMEs about it in the test cases. =[ I just failed to go back over
those until after i had reduced a bootstrap miscompile down to a single
TU, stared at the assembly for an hour, and figured out the bug. Again.

Oh well.

llvm-svn: 211955
2014-06-27 20:07:40 +00:00
Chandler Carruth dd6470a9dd [x86] Fix a miscompile in the new shuffle lowering uncovered by
a bootstrap.

I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.

llvm-svn: 211922
2014-06-27 18:25:23 +00:00
Alexander Kornienko b673b4b187 Clean up unused variable warning in release build.
llvm-svn: 211902
2014-06-27 15:30:55 +00:00
Chandler Carruth ed4a0bc734 [x86] Clean up some unused variables, especially in release builds.
llvm-svn: 211894
2014-06-27 12:04:18 +00:00
Chandler Carruth 688001f042 [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.

Differential Revision: http://reviews.llvm.org/D4292

llvm-svn: 211892
2014-06-27 11:40:13 +00:00
Chandler Carruth 0d6d1f2b17 [x86] Teach the target-specific combining how to aggressively fold
half-shuffles, even looking through intervening instructions in a chain.

Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.

Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.

Differential Revision: http://reviews.llvm.org/D4291

llvm-svn: 211890
2014-06-27 11:34:40 +00:00
Chandler Carruth 97ebc2362c [x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are
trivially redundant.

This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.

I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.

Differential Revision: http://reviews.llvm.org/D4240

llvm-svn: 211889
2014-06-27 11:27:52 +00:00
Chandler Carruth 83860cfcfa [x86] Begin a significant overhaul of how vector lowering is done in the
x86 backend.

This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.

Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
  There are cases where a previous shuffle could be modified slightly to
  arrange for elements to be in the correct position and a later shuffle
  eliminated. Doing this eagerly added quite a bit of complexity, and
  so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
  added. This is essential for real world shuffles as it turns out...

Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.

Differential Revision: http://reviews.llvm.org/D4225

llvm-svn: 211888
2014-06-27 11:23:44 +00:00
Andrea Di Biagio 1ee38843ac Silence a warning due to a comparison between signed and unsigned.
No functional change intended.

llvm-svn: 211782
2014-06-26 13:41:10 +00:00
Andrea Di Biagio 7fb85256bc [X86] Improve the selection of SSE3/AVX addsub instructions.
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:

 - (shuffle (FADD A, B), (FSUB A, B), Mask) ->
       (shuffle (FSUB A, -B), (FADD A, -B), Mask)

Where 'Mask' is:
  <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.
  <0,3>                ;; for v2f64 shuffles.
  <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.

In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.

This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.

The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.

llvm-svn: 211771
2014-06-26 10:45:21 +00:00
Andrea Di Biagio 07cdffc324 [X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.

In general:
 - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
 - BLENDPS/D instructions tend to always have better 'reciprocal throughput'
   than the equivalent SHUFPS/D;
 - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
   m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
   than one execution port.

This patch:
 - Moves the check for 'isBlendMask' immediately before the check for
   'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
 - Updates existing tests for sse/avx shuffle/blend instructions to verify
   that we select (v)blendps/d when possible (instead of (v)shufps/d or
   vperm2f128).

llvm-svn: 211720
2014-06-25 17:41:58 +00:00
Chandler Carruth e5724d7532 [x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.
llvm-svn: 211694
2014-06-25 13:12:54 +00:00
NAKAMURA Takumi 1db5995d14 Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211691
2014-06-25 12:41:52 +00:00
Andrea Di Biagio 6d9b9e125d [X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.

The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.

Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.

llvm-svn: 211679
2014-06-25 10:02:21 +00:00
Robert Khasanov 21c836823f vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them
Fix PR20088

Differential Revision: http://reviews.llvm.org/D4277

llvm-svn: 211617
2014-06-24 18:08:04 +00:00
NAKAMURA Takumi d77cefe633 Revert r211399, "Generate native unwind info on Win64"
It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.

llvm-svn: 211480
2014-06-22 22:00:56 +00:00
Filipe Cabecinhas 1af2dfd274 Fix PR20087 by using the source index when changing the vector load
llvm-svn: 211472
2014-06-22 17:21:37 +00:00
Reid Kleckner 4a01230db4 Generate native unwind info on Win64
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211399
2014-06-20 20:35:47 +00:00
Chandler Carruth 8366cebeb5 [x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW
instructions available as synthetic SDNodes PACKSS and PACKUS that will
select to the correct instruction variants based on the return type.
This allows us to use these rather important instructions when lowering
vector shuffles.

Also moves the relevant instruction definitions to be split out from
the fully generic multiclasses to allow them to match these new SDNodes
in the same way that the UNPCK instructions do.

No functionality should actually be changed here.

llvm-svn: 211332
2014-06-20 01:05:28 +00:00
Andrea Di Biagio 54b0949af9 [X86] Teach how to combine horizontal binop even in the presence of undefs.
Before this change, the backend was unable to fold a build_vector dag
node with UNDEF operands into a single horizontal add/sub.

This patch teaches how to combine a build_vector with UNDEF operands into a
horizontal add/sub when possible. The algorithm conservatively avoids to combine
a build_vector with only a single non-UNDEF operand.

Added test haddsub-undef.ll to verify that we correctly fold horizontal binop
even in the presence of UNDEFs.

llvm-svn: 211265
2014-06-19 10:29:41 +00:00
Cameron McInally 0d0489cea6 Hook up vector int_ctlz for AVX512.
llvm-svn: 211024
2014-06-16 14:12:28 +00:00
Tim Northover 51472bc600 X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly
Lowering this new node allows us to fold the almost universal
comparison for success before it's even formed. Instead we can create
a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg"
instructions set the zero-flag to the correct value.

rdar://problem/13201607

llvm-svn: 210923
2014-06-13 17:29:39 +00:00
Tim Northover 420a216817 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Andrea Di Biagio 2dd3b3b674 [X86] Teach how to dump the name of target node RDTSCP_DAG.
When I originally added node RDTSCP_DAG (r207127) I forgot to add
a string name for it in method 'getTargetNodeName'.

No functional change intended.

llvm-svn: 210769
2014-06-12 11:37:24 +00:00
Andrea Di Biagio 972ff97f8c [X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors.
This patch adds target combine rules to match:
 - [AVX] Horizontal add/sub of packed single/double precision floating point
   values from 256-bit vectors;
 - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors.

llvm-svn: 210761
2014-06-12 10:53:48 +00:00
Tim Northover 4dc9eaa6ba X86: add stringy name for X86ISD::LCMPXCHG16_DAG
I don't know what "target specific node #383" is, and I don't want to
have to.

llvm-svn: 210663
2014-06-11 17:04:08 +00:00
Andrea Di Biagio c7af75f9a7 [X86] Refactor the logic to select horizontal adds/subs to a helper function.
This patch moves part of the logic implemented by the target specific
combine rules added at r210477 to a separate helper function.
This should make easier to add more rules for matching AVX/AVX2 horizontal
adds/subs.

This patch also fixes a problem caused by a wrong check performed on indices
of extract_vector_elt dag nodes in input to the scalar adds/subs.

New tests have been added to verify that we correctly check indices of
extract_vector_elt dag nodes when selecting a horizontal operation.

llvm-svn: 210644
2014-06-11 07:57:50 +00:00
Eric Christopher 68d7559e97 Use the TargetMachine on the DAG or the MachineFunction instead
of using the cached TargetMachine.

llvm-svn: 210589
2014-06-10 21:25:13 +00:00
Eric Christopher 19b1d73e88 Add a FIXME.
llvm-svn: 210559
2014-06-10 18:31:18 +00:00
Andrea Di Biagio fa508af0fe [X86] Improved target combine rules for selecting horizontal add/sub.
This patch slightly changes the algorithm introduced at revision 210477
to fix a problem where the algorithm was producing incorrect code for 
the VEX.256 encoded versions of horizontal add/sub.

For these cases, we now try to split the two 256-bit vectors into
128-bit chunks before emitting horizontal add/sub dag nodes.

Added a new test case into haddsub-2.ll.

llvm-svn: 210545
2014-06-10 16:42:57 +00:00
Tom Stellard 3787b12255 SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC
The SelectionDAG bad a special case for ISD::SELECT_CC, where it would
allow targets to specify:

setOperationAction(ISD::SELECT_CC, MVT::Other, Expand);

to indicate that they wanted to expand ISD::SELECT_CC for all types.
This wasn't applied correctly everywhere, and it makes writing new
DAG patterns with ISD::SELECT_CC difficult.

llvm-svn: 210541
2014-06-10 16:01:29 +00:00
Tim Northover 7b9f86da5d Revert "X86: elide comparisons after cmpxchg instructions."
This reverts commit r210523. It was committed prematurely without waiting for
review.

llvm-svn: 210524
2014-06-10 10:50:11 +00:00
Tim Northover 84ad29ca1f X86: elide comparisons after cmpxchg instructions.
The C++ and C semantics of the compare_and_swap operations actually
require us to return a boolean "success" value. In LLVM terms this
means a second comparison of the output of "cmpxchg" against the input
desired value.

However, x86's "cmpxchg" instruction sets all flags for the comparison
formed, so we can skip any secondary comparison. (N.b. this isn't true
for cmpxchg8b/16b, which only set ZF).

rdar://problem/13201607

llvm-svn: 210523
2014-06-10 10:49:07 +00:00
Eric Christopher a08f30bd40 Move all of the x86 subtarget initialized variables down into the x86 subtarget
from the x86 target machine. Should be no functional change.

llvm-svn: 210479
2014-06-09 17:08:19 +00:00
Andrea Di Biagio f99dd64f0a [X86] Add target combine rules for horizontal add/sub.
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.

This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:

  (insert_vector_elt (insert_vector_elt A, I0), I1) ->
    (insert_vecto_elt (insert_vector_elt A, I1), I0)

This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.

llvm-svn: 210477
2014-06-09 16:54:41 +00:00
Andrea Di Biagio dfbdc71ea1 [X86] Avoid emitting unnecessary test instructions.
This patch teaches the backend how to check for the 'NoSignedWrap' flag on
binary operations to improve the emission of 'test' instructions.

If the result of a binary operation is known not to overflow we know that
resetting the Overflow flag is unnecessary and so we can avoid emitting
the test instruction.

Patch by Marcello Maggioni.

llvm-svn: 210468
2014-06-09 12:34:50 +00:00
Alexey Volkov 5260dba323 [X86] Use ADD/SUB instead of INC/DEC for Silvermont
According to Intel Software Optimization Manual 
on Silvermont INC or DEC instructions require 
an additional uop to merge the flags.
As a result, a branch instruction depending 
on an INC or a DEC instruction incurs a 1 cycle penalty.

Differential Revision: http://reviews.llvm.org/D3990

llvm-svn: 210466
2014-06-09 11:40:41 +00:00
Craig Topper 66f09ad041 [C++11] Use 'nullptr'.
llvm-svn: 210442
2014-06-08 22:29:17 +00:00
Benjamin Kramer d0700b2919 X86: Don't turn shifts into ands if there's another use that may not check for equality.
Fixes PR19964.

llvm-svn: 210371
2014-06-06 21:08:55 +00:00
Filipe Cabecinhas 5181255696 Fixed a bug in lowering shuffle_vectors to insertps
Summary:
We were being too strict and not accounting for undefs.
Added a test case and fixed another one where we improved codegen.

Reviewers: grosbach, nadav, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4039

llvm-svn: 210361
2014-06-06 18:07:06 +00:00
Andrea Di Biagio 4760813831 [X86] Fix checked arithmetic for i8 on X86.
When lowering a ISD::BRCOND into a test+branch, make sure that we
always use the correct condition code to emit the test operation.

This fixes PR19858: "i8 checked mul is wrong on x86".

Patch by Keno Fisher!

llvm-svn: 210032
2014-06-02 16:00:27 +00:00
Eric Christopher 8995833a34 Have the TLOF creation take a Triple rather than needing a subtarget.
llvm-svn: 209937
2014-05-31 00:07:32 +00:00
Andrea Di Biagio 446a527905 [X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.

This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
        (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
        (shuffle (BINOP A, B), Undef, <Mask>).

Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.

llvm-svn: 209930
2014-05-30 23:17:53 +00:00
Filipe Cabecinhas d3aebaf875 Separate the check for blend shuffle_vector masks
Summary:
Separate the check for blend shuffle_vector masks into isBlendMask.
This function will also be used to check if a vector shuffle is legal. No
change in functionality was intended, but we ended up improving codegen on
two tests, which were being (more) optimized only if the resulting shuffle
was legal.

Reviewers: nadav, delena, andreadb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3964

llvm-svn: 209923
2014-05-30 21:31:21 +00:00
Rafael Espindola a31f3e50dc Delete dead code.
GV is never used past this point. This was probably a copy and paste error.

llvm-svn: 209518
2014-05-23 15:07:51 +00:00
Andrea Di Biagio c8dd1ad85b [X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8.
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).

This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').

llvm-svn: 209451
2014-05-22 16:21:39 +00:00
Quentin Colombet b4d53f1afa [X86] Fix a bug in the lowering of BLENDI introduced in r209043.
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.

The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.

Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).

<rdar://problem/16975435>

llvm-svn: 209324
2014-05-21 22:00:39 +00:00
Simon Atanasyan e7fa2314af Add parentheses to suppress the gcc warning '-Wparentheses'.
No functional changes.

llvm-svn: 209203
2014-05-20 10:23:04 +00:00
Filipe Cabecinhas dc92102766 Added more insertps optimizations
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.

Also added tests for SSE4.1, AVX, and AVX2.

Reviewers: delena, nadav, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3581

llvm-svn: 209156
2014-05-19 19:45:57 +00:00
Benjamin Kramer f3ad23551d SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00
Saleem Abdulrasool f3a5a5c546 Target: remove old constructors for CallLoweringInfo
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.

llvm-svn: 209082
2014-05-17 21:50:17 +00:00
Chandler Carruth c85473143c [x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflw
were added in SSE2, no SSSE3. Found this while auditing all uses of
SSSE3 in the X86 target. I don't actually expect this to make
a significant difference on anything and I don't have any detailed test
cases but I updated the existing test cases that already covered some of
this code path.

llvm-svn: 209056
2014-05-17 03:29:20 +00:00
Filipe Cabecinhas 89654da069 Implemented special cases for PerformVSELECTCombine.
vselects with constant masks, after legalization, will get turned into
specialized shuffle_vectors so they can be matched to blend+imm
instructions.

Fixed some tests.

llvm-svn: 209044
2014-05-16 22:47:54 +00:00
Filipe Cabecinhas e15551832c Lower vselects into X86ISD::BLENDI when appropriate.
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.

This is not enough for all cases. An additional SELECTCombine optimization
will be committed.

Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.

llvm-svn: 209043
2014-05-16 22:47:49 +00:00
Filipe Cabecinhas 17254aaead Implemented LowerVSELECT to custom lower some instructions.
No functionality change intended. The types that previously were set to
lower as Expand or Legal are doing the same thing with this lowering
function.

llvm-svn: 209042
2014-05-16 22:47:43 +00:00
Rafael Espindola e0098928c9 Delete getAliasedGlobal.
llvm-svn: 209040
2014-05-16 22:37:03 +00:00
Andrea Di Biagio d621120533 [X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics.
Added target specific combine rules to fold blend intrinsics according
to the following rules:
 1) fold(blend A, A, Mask) -> A;
 2) fold(blend A, B, <allZeros>) -> A;
 3) fold(blend A, B, <allOnes>) -> B.

Added two new tests to verify that the new folding rules work for all
the optimized blend intrinsics.

llvm-svn: 208895
2014-05-15 15:18:15 +00:00
Alp Toker beaca19c7c Fix typos
llvm-svn: 208839
2014-05-15 01:52:21 +00:00
Jay Foad a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Reid Kleckner 7a59e0845f Try to fix an SDAG dependence issue with sret
r208453 added support for having sret on the second parameter.  In that
change, the code for copying sret into a virtual register was hoisted
into the loop that lowers formal parameters.  This caused a "Wrong
topological sorting" assertion failure during scheduling when a
parameter is passed in memory.  This change undoes that by creating a
second loop that deals with sret.

I'm worried that this fix is incomplete.  I don't fully understand the
dependence issues.  However, with this change we produce the same DAGs
we used to produce, so if they are broken, they are just as broken as
they have always been.

llvm-svn: 208637
2014-05-12 22:01:27 +00:00
Aaron Ballman 29fd7b9b20 Silencing an MSVC warning about not all control paths returning a value (even though the switch is fully covered). No functional change.
llvm-svn: 208565
2014-05-12 14:22:58 +00:00
Benjamin Kramer 3b36b72a9c X86: Make sure that we have SSE4.1 before we generate insertps nodes.
PR19721.

llvm-svn: 208552
2014-05-12 13:12:08 +00:00
NAKAMURA Takumi 6c383021c9 X86ISelLowering.cpp:LowerINTRINSIC_W_CHAIN(): Prune impossible "default:" [-Wcovered-switch-default]
llvm-svn: 208533
2014-05-12 10:16:46 +00:00
Elena Demikhovsky 4f591c0d45 Fixed compilation issue
llvm-svn: 208524
2014-05-12 07:45:41 +00:00
Elena Demikhovsky 8e8fde8e93 AVX-512: changes in intrinsics
1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed.
2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012).
3) I added gather/scatter prefetch intrinsics.
4) I fixed MRMm encoding for masked instructions.

llvm-svn: 208522
2014-05-12 07:18:51 +00:00
Hal Finkel f0e086a0bc Pass the value type to TLI::getRegisterByName
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.

No functionality change.

llvm-svn: 208508
2014-05-11 19:29:07 +00:00
Filipe Cabecinhas 0e3d1cb5d6 Fixed a bug when lowering build_vector (PR19694)
When lowering build_vector to an insertps, we would still lower it, even
if the source vectors weren't v4x32. This would break on avx if the source
was a v8x32. We now check the type of the source vectors.

llvm-svn: 208487
2014-05-11 08:12:56 +00:00
Reid Kleckner 7941856445 Allow sret on the second parameter as well as the first
MSVC always places the implicit sret parameter after the implicit this
parameter of instance methods.  We used to handle this for
x86_thiscallcc by allocating the sret parameter on the stack and leaving
the this pointer in ecx, but that doesn't handle alternative calling
conventions like cdecl, stdcall, fastcall, or the win64 convention.

Instead, change the verifier to allow sret on the second parameter.

This also requires changing the Mips and X86 backends to return the
argument with the sret parameter, instead of assuming that the sret
parameter comes first.

The Sparc backend also returns sret parameters in a register, but I
wasn't able to update it to handle secondary sret parameters.  It
currently calls report_fatal_error if you feed it an sret in the second
parameter.

Reviewers: rafael.espindola, majnemer

Differential Revision: http://reviews.llvm.org/D3617

llvm-svn: 208453
2014-05-09 22:32:13 +00:00
Andrea Di Biagio 932ff1ddd7 Fix 80 col violation.
No functional change intended.

llvm-svn: 208405
2014-05-09 11:08:23 +00:00
Filipe Cabecinhas e4b482b3ed Optimize shufflevector that copies an i64/f64 and zeros the rest.
Summary:
Also ran clang-format on the function. The code added is the last else
if block.

Reviewers: nadav, craig.topper, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3518

llvm-svn: 208372
2014-05-08 23:16:08 +00:00
Andrea Di Biagio e85ba4df52 [X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics.
This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift
intrinsics.

The rules are:
 - Always fold a packed arithmetic shift by zero to its first operand;
 - Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if
   the shift count is known to be smaller than the vector element size.

This patch also teaches to function 'getTargetVShiftByConstNode' how fold
target specific vector shifts by zero.

Added two new tests to verify that the DAGCombiner is able to fold
sequences of SSE2/AVX2 packed arithmetic shift calls.

llvm-svn: 208342
2014-05-08 17:44:04 +00:00
Filipe Cabecinhas 095d9d573a Lower certain build_vectors to insertps instructions
Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.

Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.

Added some tests for different cases where this optimization triggers.

Reviewers: nadav, delena, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3521

llvm-svn: 208271
2014-05-08 00:25:16 +00:00
Andrea Di Biagio c14ccc9184 [X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa).
Before this patch, the backend always emitted a store+load sequence to
bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that
performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting
i64 node was then used to build a v2i32 vector.

With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from
MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free"
bitcast to type MVT::v4i32. The elements of the resulting
v4i32 are then extracted to build a v2i32 vector (which is illegal and
therefore promoted to MVT::v2i64).

This is in general cheaper than emitting a stack store+load sequence
to bitconvert the operand from type f64 to type i64.

llvm-svn: 208107
2014-05-06 17:09:03 +00:00
Renato Golin c7aea40ec6 Implememting named register intrinsics
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).

So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.

llvm-svn: 208104
2014-05-06 16:51:25 +00:00
Reid Kleckner 4a406d32e9 Fix i128 div/mod on mingw64
The Win64 docs are very clear that anything larger than 8 bytes is
passed by reference, and GCC MinGW64 honors that for __modti3 and
friends.

Patch by Jameson Nash!

llvm-svn: 208029
2014-05-06 01:20:42 +00:00
Filipe Cabecinhas fe59062b75 Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest."
This reverts commit 207992. I misread the phab number on the LGTM.

llvm-svn: 207993
2014-05-05 19:40:36 +00:00
Filipe Cabecinhas 263d98c19f Optimize shufflevector that copies an i64/f64 and zeros the rest.
Summary:
Also ran clang-format on the function. The code added is the last else
if block.

Reviewers: nadav, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3518

llvm-svn: 207992
2014-05-05 19:36:28 +00:00
Craig Topper 2d2aa0ca1f Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Reid Kleckner fb69308568 Implement X86 code generation for musttail
Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails.  Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.

The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt.  This change merely arranges for musttail
calls to use it.

No functional change for GuaranteedTailCallOpt.

Reviewers: espindola

Differential Revision: http://reviews.llvm.org/D3493

llvm-svn: 207598
2014-04-29 23:55:41 +00:00
Elena Demikhovsky 299cf511c4 AVX-512: optimized a shuffle pattern to VINSERTI64x4.
Added intrinsics for VPERMT2PS/PD/D/Q instructions.

llvm-svn: 207513
2014-04-29 09:09:15 +00:00
Quentin Colombet 50efe87e5b [X86] Add more details in the comments of X86TargetLowering::getScalingFactorCost.
llvm-svn: 207432
2014-04-28 18:39:57 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper dd5e16dd34 Convert one last signature of getNode to take an ArrayRef of SDUse.
llvm-svn: 207376
2014-04-27 19:21:06 +00:00
Craig Topper 64941d9786 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Benjamin Kramer 3693e77cb4 X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts.
This is more expensive than pmuldq but still cheaper than scalarizing the whole thing.

llvm-svn: 207370
2014-04-27 18:47:41 +00:00
Craig Topper 206fcd450a Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper 48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Benjamin Kramer c2ad8f3ef1 Print X86ISD::PMULDQ nodes properly in debug output.
llvm-svn: 207322
2014-04-26 16:26:41 +00:00
Benjamin Kramer 6d2dff61f9 X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available.
llvm-svn: 207318
2014-04-26 14:12:19 +00:00
Benjamin Kramer c9827ab103 X86: Add patterns for MULHU/MULHS of v8i16 and v16i16.
This gets us pretty code for divs of i16 vectors. Turn the existing
intrinsics into the corresponding nodes.

llvm-svn: 207317
2014-04-26 13:01:03 +00:00
Benjamin Kramer ad0168702a Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors.
llvm-svn: 207316
2014-04-26 13:00:53 +00:00
Benjamin Kramer 29139d5cb5 X86: Custom lower v4i32 UMUL_LOHI into 2 pmuludqs.
Test will follow soon.

llvm-svn: 207314
2014-04-26 12:06:11 +00:00
Quentin Colombet ea18933d97 [X86] Implement TargetLowering::getScalingFactorCost hook.
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.

<rdar://problem/16730541>

llvm-svn: 207301
2014-04-26 01:11:26 +00:00
Filipe Cabecinhas 363b570d2a Optimization for certain shufflevector by using insertps.
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.

Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.

Reviewers: nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3475

llvm-svn: 207291
2014-04-25 23:51:17 +00:00
Craig Topper 062a2baef0 [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Benjamin Kramer 76f753e9a9 X86: Don't transform shifts into ands when the sign bit is tested.
Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode.

llvm-svn: 207145
2014-04-24 20:51:37 +00:00
Reid Kleckner 5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Andrea Di Biagio d1ab866868 [X86] Add support for Read Time Stamp Counter x86 builtin intrinsics.
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
   'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
  and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
  correctly verifies that both READCYCLECOUNTER and the two new intrinsics
  work fine for both 64bit and 32bit Subtargets.

llvm-svn: 207127
2014-04-24 17:18:27 +00:00
Benjamin Kramer f4575db2fd X86: Emit test instead of constant shift + compare if the shift result is unused.
This allows us to compile
  return (mask & 0x8 ? a : b);
into
  testb $8, %dil
  cmovnel %edx, %esi
instead of
  andl  $8, %edi
  shrl  $3, %edi
  cmovnel %edx, %esi

which we formed previously because dag combiner canonicalizes setcc of and into shift.

llvm-svn: 207088
2014-04-24 08:15:31 +00:00
Elena Demikhovsky acc5c9e83e AVX-512: store and truncstore for i1 values
llvm-svn: 206897
2014-04-22 14:13:10 +00:00
Lang Hames 3067ab2344 [X86] Use tablegen instead of DAG combines to match BZHI instructions, as
suggested by Ben Kramer in review of r206738.

Thanks again Ben!

llvm-svn: 206879
2014-04-22 10:41:56 +00:00
Lang Hames f6f42cac3f [X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the
review.

llvm-svn: 206869
2014-04-22 07:40:34 +00:00
Chandler Carruth 84e68b2994 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Lang Hames 5aa6ee80b6 [X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available.
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.

<rdar://problem/15480077>

llvm-svn: 206738
2014-04-21 08:18:53 +00:00
Adam Nemet ee7a3e38c9 [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

llvm-svn: 206634
2014-04-18 19:44:16 +00:00
Andrea Di Biagio aac2eac4c2 [X86] Improve the lowering of packed shifts by constant build_vector.
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.

When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.

Example
  (v4i32 (srl A, (build_vector < X, Y, Y, Y>)))

Can be rewritten as:
  (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))

[with X and Y ConstantInt]

The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.

llvm-svn: 206316
2014-04-15 19:30:48 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
David Blaikie 9027abae53 Change argument order and add explanatory comment to r206130
Changes requested in code review by Eric Christopher of r206130.

llvm-svn: 206219
2014-04-14 22:23:06 +00:00
David Blaikie 269e0fb2e4 Fix instruction debug info location during legalization
I found this from a particular GDB test suite case of inlining
(something similar is provided as a test case) but came across a few
other related cases (other callers of the same functions, and one other
instance of the same coding mistake in a separate function).

I'm not sure what the best way to test this is (let alone to cover the
other cases I discovered), so hopefully this sufficies - open to ideas.

llvm-svn: 206130
2014-04-13 06:39:55 +00:00
Reid Kleckner 9c6582129a Move the segmented stack switch to a function attribute
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.

Patch by Luqman Aden and Alex Crichton!

llvm-svn: 205997
2014-04-10 22:58:43 +00:00
Elena Demikhovsky cf0b9bafc3 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Elena Demikhovsky 3dcfbdfa54 AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
Matt Arsenault cf6f688a40 Add DAG parameter to ComputeNumSignBitsForTargetNode
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.

llvm-svn: 205649
2014-04-04 20:13:13 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Yaron Keren 2895496852 Added isTargetWindowsMSVC(), renamed isTargetMingw() to isTargetWindowsGNU()
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.

Suggestion by Saleem Abdulrasool!

llvm-svn: 205393
2014-04-02 04:27:51 +00:00
Yaron Keren 48d68d439a If isKnownWindowsMSVCEnvironment then getOS == Triple::Win32 and
Environment == Triple::MSVC so it will never be MinGW or Cygwin.

llvm-svn: 205349
2014-04-01 18:52:55 +00:00
Yaron Keren 136fe7db46 isTargetWindows() renamed to isTargetKnownWindowsMSVC()
to reflect its current functionality.

Based on Takumi NAKAMURA suggestion.

llvm-svn: 205338
2014-04-01 18:15:34 +00:00
Aaron Ballman 8bf5a548ea Attempting to fix r205124, which had failed asserts when built with MSVC.
Suggestion from Yaron Keren.

llvm-svn: 205313
2014-04-01 13:56:35 +00:00
Alexey Volkov 1328b28dc6 [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824

llvm-svn: 205288
2014-04-01 08:13:07 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Adam Nemet 4beef4c90d [X86] Generate VPSHUFB for in-place v16i16 shuffles
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.

Fixes <rdar://problem/16167303>

llvm-svn: 204735
2014-03-25 17:47:06 +00:00
Adam Nemet ac6d6383a3 [X86] Factor out new helper getPSHUFB
I found three implementations of this.  This splits it out into a new function
and uses it from the three places.

My plan is to add a fourth use when lowering a vector_shuffle:v16i16.

Compared the assembly output of test/CodeGen/X86 before and after.

The only change is due to how the first PSHUFB was generated in
LowerVECTOR_SHUFFLEv8i16.  If the shuffle mask specified undef (i.e. -1), the
old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the
control mask.  Now we write 0x80.  These are of course interchangeable since
bit 7 decides if a constant zero is written in the result byte.  The other
instances of this code use 0x80 consistently.

Related to <rdar://problem/16167303>

llvm-svn: 204734
2014-03-25 17:47:03 +00:00
Adam Nemet b47372f555 [X86] Fix non-determinism in LowerVectorAllZeroTest
This can be observed with the old testcase of CodeGen/X86/pr12312.ll:

47c47
<       vorps   %ymm0, %ymm1, %ymm0
---
>       vorps   %ymm1, %ymm0, %ymm0
97c97
<       vorps   %ymm1, %ymm0, %ymm0
---
>       vorps   %ymm0, %ymm1, %ymm0

The vector VecIns is populated with all the values from VecInMap. This is done
while iterating VecInMap.  VecInMap uses a hash of pointer values so the
resulting order can vary depending on the memory layout.

The fix is to populate the vector VecIns earlier as VecInMap is populated.
This is done in DAG traversal order.

Fixes <rdar://problem/16398806>

llvm-svn: 204623
2014-03-24 16:52:08 +00:00
Craig Topper c6d4efa1e5 Prune includes in X86 target.
llvm-svn: 204216
2014-03-19 06:53:25 +00:00
Adam Nemet 8a130a5f86 [X86] Fix unused variable warning with NDEBUG from r204058
llvm-svn: 204063
2014-03-17 17:32:53 +00:00
Adam Nemet 24381f1cb7 [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32.  This is a legal operation on AVX.

For that to work properly, we also need to teach the legalizer about the
specific promotion required here.  The default vector promotion uses
bitcasting to a vector type of the same total size.  We want to promote the
vector element type, effectively widening the operation and then truncating
the result.  This is analogous to the current logic of how int_to_fp is
promoted.

The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType.  This is now shared between
int_to_fp and fp_to_int.

There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86.  It can now go through the new target-independent fp_to_*int promotion
logic.

I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.

Fixes <rdar://problem/16202247>

llvm-svn: 204058
2014-03-17 17:06:14 +00:00
Arnaud A. de Grandmaison 75c9e6dedf Remove some dead assignements found by scan-build
llvm-svn: 204013
2014-03-15 22:13:15 +00:00
Owen Anderson 16c6bf49b7 Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!

Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.

llvm-svn: 203865
2014-03-13 23:12:04 +00:00
Hans Wennborg 6c37f8b985 X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3009

llvm-svn: 203581
2014-03-11 15:49:24 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Jim Grosbach c94d993adf X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Cameron McInally 791ae9927c Lower AVX v4i64->v4i32 truncate to one shuffle.
llvm-svn: 202996
2014-03-05 19:41:16 +00:00
Chandler Carruth 219b89b987 [Modules] Move CallSite into the IR library where it belogs. It is
abstracting between a CallInst and an InvokeInst, both of which are IR
concepts.

llvm-svn: 202816
2014-03-04 11:01:28 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Elena Demikhovsky 9737e3886b AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Quentin Colombet 85c9e16291 Lower unsigned vsetcc to psubus in certain cases
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt.  psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:

+    case ISD::SETULT: {
+      // If the comparison is against a constant we can turn this into a
+      // setule.  With psubus, setule does not require a swap.  This is
+      // beneficial because the constant in the register is no longer
+      // destructed as the destination so it can be hoisted out of a loop.

I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
    
rdar://problem/14338765

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 202301
2014-02-26 21:39:12 +00:00
Tim Northover aeb8e06d4c X86 CodeGenPrep: sink shufflevectors before shifts
On x86, shifting a vector by a scalar is significantly cheaper than shifting a
vector by another fully general vector. Unfortunately, because SelectionDAG
operates on just one basic block at a time, the shufflevector instruction that
reveals whether the right-hand side of a shift *is* really a scalar is often
not visible to CodeGen when it's needed.

This adds another handler to CodeGenPrepare, to sink any useful shufflevector
instructions down to the basic block where they're used, predicated on a target
hook (since on other architectures, doing so will often just introduce extra
real work).

rdar://problem/16063505

llvm-svn: 201655
2014-02-19 10:02:43 +00:00
Tim Northover f06df5866f X86: use vpsllvd (& friends) for 16-bit shifts on Haswell
llvm-svn: 201558
2014-02-18 11:15:32 +00:00
Elena Demikhovsky 1fad075974 AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence
llvm-svn: 201487
2014-02-16 11:34:23 +00:00
Andrea Di Biagio 386d566395 [X86] Teach the backend how to lower vector shift left into multiply rather than scalarizing it.
Instead of expanding a packed shift into a sequence of scalar shifts,
the backend now tries (when possible) to convert the vector shift into a
vector multiply.

Before this change, a shift of a MVT::v8i16 vector by a
build_vector of constants was always scalarized into a long sequence of "vector
extracts + scalar shifts + vector insert".
With this change, if there is SSE2 support, we emit a single vector multiply.

This change also affects SSE4.1, AVX, AVX2 shifts:
 - A shift of a MVT::v4i32 vector by a build_vector of non uniform constants
is now lowered when possible into a single SSE4.1 vector multiply.
 - Packed v16i16 shift left by constant build_vector are now expanded when
possible into a single AVX2 vpmullw.
This change also improves the lowering of AVX512f vector shifts.

Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected
by this change.

llvm-svn: 201271
2014-02-12 23:42:28 +00:00
Elena Demikhovsky 1f32c313f1 AVX: fixed a bug in LowerVECTOR_SHUFFLE
llvm-svn: 201140
2014-02-11 10:21:53 +00:00
Elena Demikhovsky 2aafc22ed9 AVX-512: Optimized BUILD_VECTOR pattern;
fixed encoding of VEXTRACTPS instruction.

llvm-svn: 201134
2014-02-11 07:25:59 +00:00
Elena Demikhovsky 9f423d6f25 AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors.
llvm-svn: 201066
2014-02-10 07:02:39 +00:00
Tim Northover 546b57b011 X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes
I believe VZEXT_MOVL means "zero all vector elements except the first" (and
should have identical input & output types) whereas VZEXT means "zero extend
each element of a vector (discarding higher elements if necessary)".

For example:
    (v4i32 (vzext (v16i8 ...)))

should zero extend the low 4 bytes of the incoming vector to 32-bits,
discarding higher bytes.

However, somewhere in the past, these two concepts had become confused, even
leading to a nonsensical VSEXT_MOVL.

This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
-> VZEXT when it's an actual extension).

rdar://problem/15981990

llvm-svn: 200918
2014-02-06 09:54:51 +00:00
Matt Arsenault 25793a3f22 Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Elena Demikhovsky 0b79be8ab2 AVX-512: optimized icmp -> sext -> icmp pattern
llvm-svn: 200849
2014-02-05 16:17:36 +00:00
Craig Topper 7ee163842f Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense.
llvm-svn: 200824
2014-02-05 07:09:40 +00:00
Reid Kleckner f5b76518c9 Implement inalloca codegen for x86 with the new inalloca design
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.

Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work.  As a result these changes are pretty
minimal.

Reviewers: echristo

Differential Revision: http://llvm-reviews.chandlerc.com/D2637

llvm-svn: 200596
2014-01-31 23:50:57 +00:00
Reid Kleckner c5d9e159eb x86: Rename NumBytesForCalleeToPush to ...Pop for accuracy
If we have a callee cleanup convention, the callee is going to pop the
arguments off the stack, not push them on.

llvm-svn: 200566
2014-01-31 19:07:18 +00:00
Andrea Di Biagio 2ea61f17ad [X86] Add extra rules for combining vselect dag nodes into movsd.
This improves the fix committed at revision 199683 adding the
following new target specific combine rules:

1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))

2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
        (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))

3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
        (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

llvm-svn: 200324
2014-01-28 18:14:21 +00:00
Juergen Ributzka 659ce00d60 [TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself.
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.

Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.

llvm-svn: 200271
2014-01-28 01:20:14 +00:00
Juergen Ributzka e758ddcd16 [X86] Prevent the creation of redundant ops for sadd and ssub with overflow.
This commit teaches the X86 backend to create the same X86 instructions when it
lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses
that overflow result. This allows SelectionDAG to recognize and remove one of
the redundant operations.

This fixes <rdar://problem/15874016> and <rdar://problem/15661073>.

Reviewed by Nadav

llvm-svn: 199976
2014-01-24 06:47:57 +00:00
Lang Hames b1ce33379a Add a few missing cases from r199933. Testcase coming shortly.
llvm-svn: 199938
2014-01-23 21:27:27 +00:00
Lang Hames 23de211c5d Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator
loops. Writing back to the accumulator (231-type) allows the coalescer to
eliminate an extra copy.

llvm-svn: 199933
2014-01-23 20:23:36 +00:00
Elena Demikhovsky a5d38a39a0 AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions,
they give better sequences than VPERMI

llvm-svn: 199893
2014-01-23 14:27:26 +00:00
Andrea Di Biagio 450d1661be [X86] Teach how to combine a vselect into a movss/movsd
Add target specific rules for combining vselect dag nodes into movss/movsd
when possible.

If the vector type of the vselect dag node in input is either MVT::v4i13 or
MVT::v4f32, then try to fold according to rules:

  1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B)
  2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)

If the vector type of the vselect dag node in input is either MVT::v2i64 or
MVT::v2f64 (and we have SSE2), then try to fold according to rules:

  3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
  4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)

llvm-svn: 199683
2014-01-20 19:35:22 +00:00
Michael Gottesman 8347c34e67 Move the retrieval of VT after all of the early exits from PerformOrCombine that do not use VT. NFC.
llvm-svn: 199612
2014-01-19 21:06:00 +00:00
Elena Demikhovsky d1487261a0 AVX-512: fixed a compare pattern
llvm-svn: 199366
2014-01-16 08:45:54 +00:00
David Majnemer dee105772c WinCOFF: Transform IR expressions featuring __ImageBase into image relative relocations
MSVC on x64 requires that we create image relative symbol
references to refer to RTTI data. Seeing as how there is no way to
explicitly make reference to a given relocation type in LLVM IR, pattern
match expressions of the form &foo - &__ImageBase.

Differential Revision: http://llvm-reviews.chandlerc.com/D2523

llvm-svn: 199312
2014-01-15 09:16:42 +00:00
Elena Demikhovsky 79b75d9048 Fixed identation.
llvm-svn: 199301
2014-01-15 07:18:11 +00:00
Lang Hames 06234ec147 Add FPExt option to CCValAssign::LocInfo. When generating calling-convention
promotion code, Tablegen will now select FPExt for floating point promotions
(previously it had returned AExt, which is not valid for floating point types).

Any out-of-tree targets that were relying on AExt being returned for FP
promotions will need to update their code check for FPExt instead.

llvm-svn: 199252
2014-01-14 19:56:36 +00:00
Nico Rieck 7157bb765e Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218
2014-01-14 15:22:47 +00:00
Elena Demikhovsky 767fc967b4 AVX-512: optimized scalar compare patterns
removed AVX512SI format, since it is similar to AVX512BI.

llvm-svn: 199217
2014-01-14 15:10:08 +00:00
Andrea Di Biagio 5448a3c771 [X86] Fix assertion failure caused by a wrong folding of vector shifts by immediate count.
This fixes a regression intruced by r198113.

Revision r198113 introduced an algorithm that tries to fold a vector shift
by immediate count into a build_vector if the input vector is a known vector
of constants.

However the algorithm only worked under the assumption that the input vector
type and the shift type are exactly the same.

This patch disables the folding of vector shift by immediate count if the
input vector type and the shift value type are not the same.

llvm-svn: 199213
2014-01-14 13:17:12 +00:00
Nico Rieck 9d2e0df049 Revert "Decouple dllexport/dllimport from linkage"
Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207
2014-01-14 12:38:32 +00:00
Nico Rieck e43aaf7967 Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204
2014-01-14 11:55:03 +00:00
Elena Demikhovsky 172a27c750 AVX-512: Added more intrinsics for pmin/pmax, pabs, blend, pmuldq.
llvm-svn: 198745
2014-01-08 10:54:22 +00:00
Bill Wendling 13199b17f8 Remove unnecessary #includes.
llvm-svn: 198585
2014-01-06 06:00:00 +00:00
Bill Wendling 908bf814e7 Refactor function that checks that __builtin_returnaddress's argument is constant.
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.

llvm-svn: 198579
2014-01-06 00:43:20 +00:00
Elena Demikhovsky f404e054a1 AVX-512: changed property name from "neverHasSideEffects=1" to "hasSideEffects=0", added this property to VMOVSS/VMOVSD;
Optimized a truncate pattern.

llvm-svn: 198562
2014-01-05 14:21:07 +00:00
Elena Demikhovsky 52e4a0e109 AVX-512: Added more intrinsics for convert and min/max.
Removed vzeroupper from AVX-512 mode - our optimization gude does not recommend to insert vzeroupper at all.

llvm-svn: 198557
2014-01-05 10:46:09 +00:00
Bill Wendling df7dd28dc8 Emit an error message if the value passed to __builtin_returnaddress isn't a constant
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.

llvm-svn: 198531
2014-01-05 01:47:20 +00:00
Craig Topper a448bd868f Make more of the x86 lowering helper functions static.
llvm-svn: 198146
2013-12-29 01:48:38 +00:00
Craig Topper 059e8e0da1 Switch from EVT to MVT in more of the x86 instruction lowering code.
llvm-svn: 198144
2013-12-29 01:10:06 +00:00
Craig Topper bf096926c9 Use getSimpleValueType in a few spots where the type should be simple.
llvm-svn: 198117
2013-12-28 18:35:48 +00:00
Craig Topper e829fe42af Minor indentation fix to match other switch statements. Change llvm_unreachable text to match similar places.
llvm-svn: 198116
2013-12-28 17:37:32 +00:00
Andrea Di Biagio eaceba0ed0 [X86] Teach the backend how to fold target specific dag node for packed
vector shift by immedate count (VSHLI/VSRLI/VSRAI) into a build_vector when
the vector in input to the shift is a build_vector of all constants or UNDEFs.

Target specific nodes for packed shifts by immediate count are in
general introduced by function 'getTargetVShiftByConstNode' (in
X86ISelLowering.cpp) when lowering shift operations, SSE/AVX immediate
shift intrinsics and (only in very few cases) SIGN_EXTEND_INREG dag
nodes.

This patch adds extra rules for simplifying vector shifts inside
function 'getTargetVShiftByConstNode'.

Added file test/CodeGen/X86/vec_shift5.ll to verify that packed
shifts by immediate are correctly folded into a build_vector when the
input vector to the shift dag node is a vector of constants or undefs.

llvm-svn: 198113
2013-12-28 11:11:52 +00:00
Elena Demikhovsky b64d7e8586 AVX-512: Result type of scalar SETCC is MVT::i1 for AVX-512.
llvm-svn: 198008
2013-12-25 10:06:40 +00:00
Elena Demikhovsky 64c9548d66 AVX-512: fixed some patterns for MVT::i1
llvm-svn: 197981
2013-12-24 14:24:07 +00:00
Elena Demikhovsky fe24a30e38 AVX512: SETCC returns i1 for AVX-512 and i8 for all others
llvm-svn: 197876
2013-12-22 10:13:18 +00:00
Duncan P. N. Exon Smith ab5dbebc11 Assert that the last operand is actually EFLAGS
This is another follow-up to r197503, after a post-commit review by
Andy.

<rdar://problem/15627766>

llvm-svn: 197520
2013-12-17 20:28:21 +00:00
Duncan P. N. Exon Smith 512601d77f Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS""
This reverts commit r197481, recommiting r197469 with an extra fix.

The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS.  Mark it so, or else the scheduler might
place it in the middle of another test+branch.

This fixes a bug exposed by r192750, which changed the initial scheduler
to source-order as part of enabling the MI Scheduler for X86.

This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to
try to save %flags, and adds a test that catches the bad behavior of
r197469.

<rdar://problem/15627766>

llvm-svn: 197503
2013-12-17 15:54:45 +00:00
Stepan Dyatkovskiy 7f7c2710e0 Fix for PR18045:
http://llvm.org/bugs/show_bug.cgi?id=18045

Short issue description:
For X86 machines with sse < sse4.1 we got failures for some
particular load/store vector sequences:

$ clang-trunk -m32 -O2 test-case.c
fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850,
      0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82]
      [ID=58]
  0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43]
    0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23]
  0x41dcb10: i32 = undef [ID=2]

The reason is that EltsFromConsecutiveLoads could emit such load instruction
both before and after legalize stage. Though this instruction is not legal for
machines with SSSE3 and lower.

The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we
check whether nodes it emits are legal. 

P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8),
perhaps I'll slow with response, so you better reject this commit. Thanks!

llvm-svn: 197492
2013-12-17 12:07:33 +00:00
Elena Demikhovsky c5f6726a24 AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader).
Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1.

llvm-svn: 197482
2013-12-17 08:33:15 +00:00
Elena Demikhovsky 47fc44e52e AVX-512: Added legal type MVT::i1 and VK1 register for it.
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.

llvm-svn: 197384
2013-12-16 13:52:35 +00:00
Benjamin Kramer e723bb10b0 X86: When lowering shl_parts, don't emit shift amounts larger than the bit width.
While it's safe for the X86-specific shift nodes, dag combining will
kill generic nodes. Insert an AND to make it safe, isel will nuke it
as x86's shift instructions have an implicit AND.

Fixes PR16108, which contains a contraption to hit this case in between
constant folders.

llvm-svn: 197228
2013-12-13 13:40:24 +00:00
Rafael Espindola 32cb5ac904 Switch to the new MingW ABI.
GCC 4.7 changed the MingW ABI. On the LLVM side it means that sret functions
don't pop the stack.

llvm-svn: 197163
2013-12-12 16:06:58 +00:00
Tim Northover 9653eb5759 Make Triple's isOSBinFormatXXX functions partition triple-space.
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.

This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.

llvm-svn: 196934
2013-12-10 16:57:43 +00:00
Elena Demikhovsky e382c3fdcd AVX-512: changed intrinsics for mask operations
llvm-svn: 196918
2013-12-10 13:53:10 +00:00
Cameron McInally c5f420e129 Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.
Patch by Aleksey Bader.

llvm-svn: 196386
2013-12-04 14:52:33 +00:00
Lang Hames 39609996d9 Refactor a lot of patchpoint/stackmap related code to simplify and make it
target independent.

Most of the x86 specific stackmap/patchpoint handling was necessitated by the
use of the native address-mode format for frame index operands. PEI has now
been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing
us to use a simple, platform independent register/offset pair for frame
indexes on stackmap/patchpoints.

Notes:
  - Folding is now platform independent and automatically supported.
  - Emiting patchpoints with direct memory references now just involves calling
    the TargetLoweringBase::emitPatchPoint utility method from the target's
    XXXTargetLowering::EmitInstrWithCustomInserter method. (See
    X86TargetLowering for an example).
  - No more ugly platform-specific operand parsers.

This patch shouldn't change the generated output for X86. 

llvm-svn: 195944
2013-11-29 03:07:54 +00:00
Michael Liao d617a3015d Fix PR18054
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
  lowering where we need to check whether x is a vector type (in-reg
  type) of i8, i16 or i32; otherwise, that optimization is not valid.

llvm-svn: 195779
2013-11-26 20:31:31 +00:00
Andrew Trick 391dbadb51 StackMap: Implement support for DirectMemRefOp.
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)

Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.

llvm-svn: 195712
2013-11-26 02:03:25 +00:00
Andrew Trick d3ab37cfeb whitespace
llvm-svn: 195711
2013-11-26 02:03:20 +00:00
Jim Grosbach 860934a924 X86: Perform integer comparisons at i32 or larger.
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.

rdar://15386341

llvm-svn: 195496
2013-11-22 19:57:47 +00:00
Michael Liao 02160d580b Fix PR18014
- When simplifying the mask generation for BLEND, check whether that mask is
  also consumed by other non-BLEND insns. If true, skip that simplification.

llvm-svn: 195476
2013-11-22 17:56:57 +00:00
Rafael Espindola 5a8e985ad3 Don't produce tail calls when the caller is x86_thiscallcc.
The callee will not pop the stack for us.

llvm-svn: 195467
2013-11-22 15:18:28 +00:00
Kostya Serebryany 4007009815 Revert r195318 as it causes miscompilation (PR18029)
llvm-svn: 195439
2013-11-22 10:30:39 +00:00
Ekaterina Romanova d5fa55470c SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Bill Wendling 07787f8747 The basic problem is that some mainstream programs cannot deal with the way
clang optimizes tail calls, as in this example:

int foo(void);
int bar(void) {
 return foo();
}

where the call is transformed to:

  calll .L0$pb
.L0$pb:
  popl  %eax
.Ltmp0:
  addl  $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
  movl  foo@GOT(%eax), %eax
  popl  %ebp
  jmpl  *%eax                   # TAILCALL

However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.

This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.

Patch by Dimitry Andric!

PR15086

llvm-svn: 195318
2013-11-21 07:04:30 +00:00
NAKAMURA Takumi 3dedf827f8 X86ISelLowering.cpp: Mark a variable VT as LLVM_ATTRIBUTE_UNUSED. [-Wunused-variable]
llvm-svn: 195238
2013-11-20 10:55:22 +00:00
NAKAMURA Takumi f2392ebb94 Whitespace.
llvm-svn: 195237
2013-11-20 10:55:15 +00:00
Elena Demikhovsky a5967af97d Fixed compilation error.
llvm-svn: 195230
2013-11-20 09:23:22 +00:00
Elena Demikhovsky e1f9bf054f AVX-512: Concat 4 128-bit vectors in one 512-bit vector.
llvm-svn: 195229
2013-11-20 09:10:40 +00:00
Cameron McInally ad41f1f693 Add AVX512 unmasked FMA intrinsics and support.
llvm-svn: 194824
2013-11-15 17:01:14 +00:00
Matt Arsenault b03bd4d96b Add addrspacecast instruction.
Patch by Michele Scandale!

llvm-svn: 194760
2013-11-15 01:34:59 +00:00
Elena Demikhovsky 0a74b7da35 AVX-512: Handled extractelement from mask vector;
Added VMOSHDUP/VMOVSLDUP shuffle instructions.

llvm-svn: 194691
2013-11-14 11:29:27 +00:00
Juergen Ributzka 34c652d34d SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.

The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 194542
2013-11-13 01:57:54 +00:00
Juergen Ributzka 87ed906b2e [Stackmap] Materialize the jump address within the patchpoint noop slide.
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.

Differential Revision: http://llvm-reviews.chandlerc.com/D2074

Reviewed by Andy

llvm-svn: 194306
2013-11-09 01:51:33 +00:00
Juergen Ributzka 9969d3e6e8 [Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).

Differential Revision: http://llvm-reviews.chandlerc.com/D2009

Reviewed by Andy

llvm-svn: 194293
2013-11-08 23:28:16 +00:00
Eric Christopher 542c8d934d Check for both styles of clobbers, those produced by dragonegg and
those produced by clang for the inline asm bswap conversion.

Modified from a patch by Chris Smowton.

llvm-svn: 194016
2013-11-04 21:41:21 +00:00
Elena Demikhovsky 496656900e AVX-512: Implemented CMOV for 512-bit vectors
llvm-svn: 193747
2013-10-31 13:15:32 +00:00
Juergen Ributzka 3bd686d493 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
Now Hexagon and SystemZ are not happy with it :-(

llvm-svn: 193677
2013-10-30 06:36:19 +00:00
Juergen Ributzka 6ad05d6b95 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 193676
2013-10-30 05:48:18 +00:00
Elena Demikhovsky 199c823555 AVX-512: PMIN/PMAX intrinsics and patterns
Patch by Cameron McInally <cameron.mcinally@nyu.edu>

llvm-svn: 193497
2013-10-27 08:18:37 +00:00
Nadav Rotem d369d4bdf9 Optimize concat_vectors(X, undef) -> scalar_to_vector(X).
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.

llvm-svn: 193393
2013-10-25 06:41:18 +00:00
Yaron Keren 79bb266346 (this is a corrected patch)
Calling _chkstk is required on ELF as well as COFF on Windows. Without 
_chkstk, functions requiring large stack crash in initialization code.

Previous code tested for COFF format but not Mach-O and this patch modifies 
the code to test for Windows OS (both Windows target and MingW target) 
but not Mach-O object format: Looks like macho environment was used to 
build some EFI code.
 
Credits to Andrew MacPherson.

llvm-svn: 193289
2013-10-23 23:37:01 +00:00
Rafael Espindola bca3ab0905 Revert "Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows."
This reverts commit r193263.

It is causing CodeGen/X86/mingw-alloca.ll to fail.

llvm-svn: 193275
2013-10-23 21:45:09 +00:00
Benjamin Kramer 0ccab2d66c X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.
Also update the cost model.

llvm-svn: 193270
2013-10-23 21:06:07 +00:00
Yaron Keren 03ac82edf5 Calling _chkstk is required on ELF as well as COFF on Windows.
Without _chkstk functions requiring large stack crash in 
initialization code. Previous code tested for COFF format but 
not Mach-O and this patch modifies the code to test for Windows.

Credits to Andrew MacPherson.

llvm-svn: 193263
2013-10-23 19:40:07 +00:00
Benjamin Kramer da8446b833 X86: Custom lower zext v16i8 to v16i16.
On sandy bridge (PR17654) we now get
	vpxor	%xmm1, %xmm1, %xmm1
	vpunpckhbw	%xmm1, %xmm0, %xmm2
	vpunpcklbw	%xmm1, %xmm0, %xmm0
	vinsertf128	$1, %xmm2, %ymm0, %ymm0

On haswell it's a simple
	vpmovzxbw	%xmm0, %ymm0

There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.

llvm-svn: 193262
2013-10-23 19:19:04 +00:00
Jim Grosbach 005e5b55a7 X86: Make concat_vectors combine a bit more conservative.
Per Nadav's review comments for r192866.

llvm-svn: 193252
2013-10-23 17:37:40 +00:00
Lang Hames 2783993fca X86 vector element shift-by-immediate instructions take i8 immediates. Make
the instruction defenitions and ISEL reflect this.

Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).

Fixes <rdar://problem/14968098>

llvm-svn: 193096
2013-10-21 17:51:24 +00:00
Elena Demikhovsky 665c90e184 AVX-512: MUL operation lowering for v8i64
llvm-svn: 193083
2013-10-21 13:27:34 +00:00
Jim Grosbach c044c65470 x86: Move bitcasts outside concat_vector.
Consider the following:

typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));

int4 __bbase_cvt_int(ushort4 v) {
  ushort8 a;
  a.lo = v;
  return _mm_cvtepu16_epi32(a);
}

This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
  %tmp = bitcast double %v.coerce to <4 x i16>
  %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
  %0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
  %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
  ret <4 x i32> %tmp2
}

The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
  movsd %xmm0, -8(%rsp)       <== Spill to the stack.
  movq  -8(%rsp), %xmm0       <== Reload it right back out.
  pmovzxwd  %xmm0, %xmm1      <== Here's what we actually asked for.
  pblendw $1, %xmm1, %xmm0    <== We don't need this at all
  pmovzxwd  %xmm0, %xmm0      <== We already did this
  ret

The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).

To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
  pmovzxwd  %xmm0, %xmm0
  ret

rdar://15245794

llvm-svn: 192866
2013-10-17 02:58:06 +00:00
Michael Liao ad71659def Fix PR17546
- Type of index used in extract_vector_elt or insert_vector_elt supposes
  to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd
  better to truncate (or zero-extend in case it's changed later) it to
  mask element type to guarantee they are matching instead of asserting
  that.

llvm-svn: 192722
2013-10-15 17:51:58 +00:00
Michael Liao 8ba068211d Fix PR16807
- Lower signed division by constant powers-of-2 to target-independent
  DAG operators instead of target-dependent ones to support them better
  on targets where vector types are legal but shift operators on that
  types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16>
  though <16 x i16> is a legal type.

llvm-svn: 192721
2013-10-15 17:51:02 +00:00
Eric Christopher 755711e510 Reformat this routine slightly.
llvm-svn: 192630
2013-10-14 21:52:23 +00:00
Elena Demikhovsky 82a46ebe0a Fixed a bug in dynamic allocation memory on stack.
The alignment of allocated space was wrong, see Bugzila 17345.

Done by Zvi Rackover <zvi.rackover@intel.com>.

llvm-svn: 192573
2013-10-14 07:26:51 +00:00
Benjamin Kramer 7b5e159450 X86: Fix type check. Just because an integer type is illegal doesn't mean it's i64.
Fixes PR17495, where an i24 triggered this code. It's intended to
optimize i64 loads on 32 bit x86.

llvm-svn: 192123
2013-10-07 19:11:35 +00:00
Elena Demikhovsky 2e408aefe0 AVX-512: added scalar convert instructions and intrinsics.
Fixed load folding in VPERM2I instruction.

llvm-svn: 192063
2013-10-06 13:11:09 +00:00
Elena Demikhovsky 462a2d235b AVX-512: fixed shuffle lowering
in case of BLEND and added VSHUFPS patterns.

llvm-svn: 192055
2013-10-06 06:11:18 +00:00
Craig Topper b01cd1aa74 Add patterns for selecting TBM instructions from logical operations. Patch from Yunzhong Gao.
llvm-svn: 191871
2013-10-03 04:16:45 +00:00
Rafael Espindola 44fee4e0eb Remove several unused variables.
Patch by Alp Toker.

llvm-svn: 191757
2013-10-01 13:32:03 +00:00
Robert Wilhelm f0cfb83bb4 Fix spelling intruction -> instruction.
llvm-svn: 191610
2013-09-28 11:46:15 +00:00
Juergen Ributzka f043a65327 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
This reverts commit r191130.

llvm-svn: 191138
2013-09-21 15:09:46 +00:00
Juergen Ributzka c2551eb4ff Fix the buildbot
llvm-svn: 191133
2013-09-21 05:15:01 +00:00
Juergen Ributzka ab930591c7 [X86] Emulate AVX 256bit MIN/MAX support by splitting the vector.
In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't
split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors
and this fix enables vector splitting for this special case in the X86 DAG
Combiner.

This fix is related to PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191131
2013-09-21 04:55:22 +00:00
Juergen Ributzka e9a80fc912 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191130
2013-09-21 04:55:18 +00:00
Elena Demikhovsky 8952974e29 AVX-512: implemented extractelement with variable index.
Added parsing of mask register and "zeroing" semantic, like {%k1} {z}.

llvm-svn: 190595
2013-09-12 08:55:00 +00:00
Juergen Ributzka 53d0b492f5 [X86] Perform VSELECT DAG combines also before DAG type legalization.
If the DAG already has only legal types, then the second round of DAG combines
is skipped. In this case VSELECT+SETCC patterns that match a more efficient
instruction (e.g. min/max) are never recognized.

This fix allows VSELECT+SETCC combines if the types are already legal before DAG
type legalization.

Reviewer: Nadav
llvm-svn: 190105
2013-09-05 23:02:56 +00:00
Craig Topper b25f0f5538 Create BEXTR instructions for (and ((sra or srl) x, imm), (2**size - 1)). Fixes PR17028.
llvm-svn: 189742
2013-09-02 07:53:17 +00:00
Elena Demikhovsky 4def4b088f AVX-512: Added GATHER and SCATTER instructions.
llvm-svn: 189729
2013-09-01 14:24:41 +00:00
Craig Topper f78c19c3bb Fixup BZHI selection to remove an unneeded zero extension.
llvm-svn: 189656
2013-08-30 07:16:16 +00:00
Craig Topper 0bccad2d43 Teach X86 backend to create BMI2 BZHI instructions from (and X, (add (shl 1, Y), -1)). Fixes PR17038.
llvm-svn: 189653
2013-08-30 06:52:21 +00:00
Elena Demikhovsky 980c6b08b1 AVX-512: added extend and truncate instructions.
llvm-svn: 189580
2013-08-29 11:56:53 +00:00
Elena Demikhovsky 12f24673e0 AVX-512: Added FMA instructions.
llvm-svn: 189326
2013-08-27 08:39:25 +00:00
Elena Demikhovsky 0a2b6290f1 AVX-512: Added shuffle instructions -
VPSHUFD, VPERMILPS, VMOVDDUP, VMOVLHPS, VMOVHLPS, VSHUFPS, VALIGN
 single and double forms.

llvm-svn: 189215
2013-08-26 12:45:35 +00:00
Elena Demikhovsky f8f478b19d AVX-512: added UNPACK instructions and tests for all-zero/all-ones vectors
llvm-svn: 189189
2013-08-25 12:54:30 +00:00
Elena Demikhovsky 33d447a2d6 AVX-512: Added SHIFT instructions.
llvm-svn: 188899
2013-08-21 09:36:02 +00:00
Elena Demikhovsky 1490c5eb5b AVX-512: added arithmetic and logical operations.
ADD, SUB, MUL integer and FP types. OR, AND, XOR.
Added embeded broadcast form for these instructions.

llvm-svn: 188673
2013-08-19 13:26:14 +00:00
Elena Demikhovsky 3ce8dbbac2 AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions.
llvm-svn: 188637
2013-08-18 13:08:57 +00:00
Craig Topper e6861c9ce5 Make more of the lowering helpers static. Also use MVT instead of EVT in a couple places.
llvm-svn: 188629
2013-08-18 08:53:01 +00:00
Craig Topper 8dbc7e9d35 Revert r188449 as it turns out we're just missing the instructions that need the v16i32/v16f32 matching.
llvm-svn: 188454
2013-08-15 08:38:25 +00:00
Craig Topper 2ffd06528d Don't let isPermImmMask handle v16i32 since VPERMI doesn't match on that type. Remove 128-bit vector handling from isPermImmMask too, it's covered by isPSHUFDMask.
llvm-svn: 188449
2013-08-15 07:30:51 +00:00
Craig Topper 6f4dd2dacf Use MVT in place of EVT in more X86 operation lowering functions.
llvm-svn: 188445
2013-08-15 05:33:45 +00:00
Craig Topper 5671010cbb Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT().
llvm-svn: 188441
2013-08-15 02:33:50 +00:00
Craig Topper d03748cf5e Make more helper methods into static functions.
llvm-svn: 188366
2013-08-14 07:53:41 +00:00
Craig Topper 7b7b159574 Remove tab characters.
llvm-svn: 188365
2013-08-14 07:35:18 +00:00
Craig Topper d905fded68 Make some helper methods static.
llvm-svn: 188364
2013-08-14 07:34:43 +00:00
Craig Topper 60769e050d Use MVT in more lowering code.
llvm-svn: 188363
2013-08-14 07:04:42 +00:00
Craig Topper 52b00359b1 Replace EVT with MVT in isVectorShift. Keeps compiler from generating unneeded checks and handling for extended types.
llvm-svn: 188362
2013-08-14 06:21:10 +00:00
Craig Topper 67476d7485 Replace EVT with MVT in many of the shuffle lowering functions. Keeps compiler from generating unneeded checks and handling for extended types.
llvm-svn: 188361
2013-08-14 05:58:39 +00:00
Evgeniy Stepanov 7dee697faa Fix compiler warnings.
../lib/Target/X86/X86ISelLowering.cpp:9715:7: error: unused variable 'OpVT' [-Werror,-Wunused-variable]
  EVT OpVT = Op0.getValueType();
      ^
../lib/Target/X86/X86ISelLowering.cpp:9763:14: error: unused variable 'NumElems' [-Werror,-Wunused-variable]
    unsigned NumElems = VT.getVectorNumElements();

llvm-svn: 188269
2013-08-13 14:04:20 +00:00
Elena Demikhovsky 60b1f289f2 AVX-512: Added CMP and BLEND instructions.
Lowering for SETCC.

llvm-svn: 188265
2013-08-13 13:24:07 +00:00
Elena Demikhovsky 5fed3b95db AVX-512: Added more tests for BROADCAST
llvm-svn: 188148
2013-08-11 12:29:16 +00:00
Elena Demikhovsky cf5b1458e6 AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions.
Added a test for shuffles using VPERM.

llvm-svn: 188147
2013-08-11 07:55:09 +00:00
Elena Demikhovsky 45c54ad8dc AVX-512 set: Added BROADCAST instructions
with lowering logic and a test.

llvm-svn: 187884
2013-08-07 12:34:55 +00:00
Craig Topper c5b0ad27ab Simplify code. No functional change intended.
llvm-svn: 187870
2013-08-07 08:16:07 +00:00
Tim Northover a4415854db Refactor isInTailCallPosition handling
This change came about primarily because of two issues in the existing code.
Niether of:

define i64 @test1(i64 %val) {
  %in = trunc i64 %val to i32
  tail call i32 @ret32(i32 returned %in)
  ret i64 %val
}

define i64 @test2(i64 %val) {
  tail call i32 @ret32(i32 returned undef)
  ret i32 42
}

should be tail calls, and the function sameNoopInput is responsible. The main
problem is that it is completely symmetric in the "tail call" and "ret" value,
but in reality different things are allowed on each side.

For these cases:
1. Any truncation should lead to a larger value being generated by "tail call"
   than needed by "ret".
2. Undef should only be allowed as a source for ret, not as a result of the
   call.

Along the way I noticed that a mismatch between what this function treats as a
valid truncation and what the backends see can lead to invalid calls as well
(see x86-32 test case).

This patch refactors the code so that instead of being based primarily on
values which it recurses into when necessary, it starts by inspecting the type
and considers each fundamental slot that the backend will see in turn. For
example, given a pathological function that returned {{}, {{}, i32, {}}, i32}
we would consider each "real" i32 in turn, and ask if it passes through
unchanged. This is much closer to what the backend sees as a result of
ComputeValueVTs.

Aside from the bug fixes, this eliminates the recursion that's going on and, I
believe, makes the bulk of the code significantly easier to understand. The
trade-off is the nasty iterators needed to find the real types inside a
returned value.

llvm-svn: 187787
2013-08-06 09:12:35 +00:00
Craig Topper cf969eadaf Simplify vector lane handling math a bit. No functional change intended.
llvm-svn: 187783
2013-08-06 07:23:12 +00:00
Craig Topper 7418ff460c Simplify math a little bit.
llvm-svn: 187781
2013-08-06 06:54:25 +00:00
Craig Topper 9bc00b65b6 Replace EVT with MVT in isHorizontalBinOp as it is only called with legal types.
llvm-svn: 187779
2013-08-06 06:05:05 +00:00
Craig Topper 47d7c5c8fe Simplify code slightly. No functional change.
llvm-svn: 187771
2013-08-06 04:12:40 +00:00
Aaron Ballman 5b4634576e Silencing an MSVC11 type conversion warning.
llvm-svn: 187727
2013-08-05 13:47:03 +00:00
Elena Demikhovsky 40864b690b AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types.
Added intrinsics and tests.

llvm-svn: 187717
2013-08-05 08:52:21 +00:00
Benjamin Kramer 5bc180c14f X86: Turn fp selects into mask operations.
double test(double a, double b, double c, double d) { return a<b ? c : d; }

before:
_test:
	ucomisd	%xmm0, %xmm1
	ja	LBB0_2
	movaps	%xmm3, %xmm2
LBB0_2:
	movaps	%xmm2, %xmm0

after:
_test:
	cmpltsd	%xmm1, %xmm0
	andpd	%xmm0, %xmm2
	andnpd	%xmm3, %xmm0
	orpd	%xmm2, %xmm0

Small speedup on Benchmarks/SmallPT

llvm-svn: 187706
2013-08-04 12:05:16 +00:00
Tim Northover ecc018c7b7 X86: correct tail return address calculation
Due to the weird and wondeful usual arithmetic conversions, some
calculations involving negative values were getting performed in
uint32_t and then promoted to int64_t, which is really not a good
idea.

Patch by Katsuhiro Ueno.

llvm-svn: 187703
2013-08-04 09:35:57 +00:00
Elena Demikhovsky b1266b5447 EVEX and compressed displacement encoding for AVX512
llvm-svn: 187576
2013-08-01 13:34:06 +00:00
Elena Demikhovsky b0a75431ad Fixed assertion in Extract128BitVector()
llvm-svn: 187493
2013-07-31 12:03:08 +00:00
Elena Demikhovsky 67b05fc0b3 Added INSERT and EXTRACT intructions from AVX-512 ISA.
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.

llvm-svn: 187491
2013-07-31 11:35:14 +00:00
Nico Rieck 06d17c80cc Proper va_arg/va_copy lowering on win64
Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like
other 64-bit targets.

llvm-svn: 187355
2013-07-29 13:07:06 +00:00
Justin Holewinski d3f2035a3c Add a target legalize hook for SplitVectorOperand (again)
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

Attempt to fix the buildbots by making the X86 test I just added platform independent

llvm-svn: 187202
2013-07-26 13:28:29 +00:00
Rafael Espindola 1d812728cc Revert "Add a target legalize hook for SplitVectorOperand"
This reverts commit 187198. It broke the bots.

The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".

llvm-svn: 187201
2013-07-26 13:18:16 +00:00
Justin Holewinski f848a24e50 Add a target legalize hook for SplitVectorOperand
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

llvm-svn: 187198
2013-07-26 12:46:39 +00:00
Elena Demikhovsky 8cfb43f73b I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31.
The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions

llvm-svn: 187030
2013-07-24 11:02:47 +00:00
Juergen Ributzka 3d527d80b8 [X86] Use min/max to optimze unsigend vector comparison on X86
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.

Reviewer: Ben

radar:5972691

llvm-svn: 186432
2013-07-16 18:20:45 +00:00
Craig Topper 202fbc2c9b Add 'static' keyword to some const arrays for consistency.
llvm-svn: 186308
2013-07-15 06:54:12 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Stephen Lin fda967fdea X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible
Patch by Andrea Di Biagio

llvm-svn: 186165
2013-07-12 15:31:36 +00:00
Charles Davis e8f297ca94 Target/X86: Add explicit Win64 and System V/x86-64 calling conventions.
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.

Reviewers:

CC:

llvm-svn: 186144
2013-07-12 06:02:35 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Nico Rieck 51969be724 Reuse %rax after calling __chkstk on win64
Reapply this as I reverted the wrong commit.

llvm-svn: 185807
2013-07-08 11:20:11 +00:00
Nico Rieck 4801303ce1 Revert "Proper va_arg/va_copy lowering on win64"
This reverts commit 2b52880592a525cfe04d8f9008a35da8c2ea94c3.

Needs review.

llvm-svn: 185806
2013-07-08 11:19:44 +00:00
Nico Rieck 43b51056d6 Revert "Reuse %rax after calling __chkstk on win64"
This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e.

llvm-svn: 185781
2013-07-08 01:30:57 +00:00
Nico Rieck 7adf6111a8 Reuse %rax after calling __chkstk on win64
llvm-svn: 185778
2013-07-07 16:48:39 +00:00
Nico Rieck 99ef2890c0 Proper va_arg/va_copy lowering on win64
llvm-svn: 185763
2013-07-06 18:08:19 +00:00
Jakob Stoklund Olesen db429d9483 Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185625
2013-07-04 13:54:20 +00:00
Jakob Stoklund Olesen a1f5b901a5 Revert r185595-185596 which broke buildbots.
Revert "Simplify landing pad lowering."
Revert "Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes."

llvm-svn: 185600
2013-07-04 00:26:30 +00:00
Jakob Stoklund Olesen f33ec531fa Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185596
2013-07-03 23:56:31 +00:00
Craig Topper 31ee5866de Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185540
2013-07-03 15:07:05 +00:00
Elena Demikhovsky 6769c50d9e Optimized integer vector multiplication operation by replacing it with shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector.
llvm-svn: 184931
2013-06-26 10:55:03 +00:00
Chad Rosier 295bd43adb The getRegForInlineAsmConstraint function should only accept MVT value types.
llvm-svn: 184642
2013-06-22 18:37:38 +00:00
Bill Wendling 8f26840c5a Don't cache the instruction and register info from the TargetMachine, because
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183571
2013-06-07 21:00:34 +00:00
Andrew Trick ad6d08ac6f Order CALLSEQ_START and CALLSEQ_END nodes.
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.

Patch by Xiaoyi Guo!

This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.

llvm-svn: 182885
2013-05-29 22:03:55 +00:00
Andrew Trick ef9de2a739 Track IR ordering of SelectionDAG nodes 2/4.
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.

llvm-svn: 182703
2013-05-25 02:42:55 +00:00
Michael J. Spencer df1ecbd734 Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.
llvm-svn: 182680
2013-05-24 22:23:49 +00:00
Nadav Rotem 7b66c47051 X86: Fix a bug in EltsFromConsecutiveLoads. We can't generate new loads without chains.
llvm-svn: 182507
2013-05-22 19:28:41 +00:00
Benjamin Kramer d76cc186fc X86: When expanding PCMPGTQ to PCMPGTD we always want to compare the lower halves as unsigned.
Take #2 on fixing PR15977.

llvm-svn: 182486
2013-05-22 17:01:12 +00:00
Benjamin Kramer 18ef6b22b9 X86: When emulating unsigned PCMPGTQ with PCMPGTD, fix the sign bit for the smaller type.
Otherwise we'll get a mix of signed and unsigned compares.
Fixes PR15977.

llvm-svn: 182364
2013-05-21 09:58:54 +00:00
Matt Arsenault 75865923c9 Add LLVMContext argument to getSetCCResultType
llvm-svn: 182180
2013-05-18 00:21:46 +00:00
Benjamin Kramer fc33e1d99b X86: Make shuffle -> shift conversion more aggressive about undefs.
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.

We still prefer palignr over psrldq because it has higher throughput on
sandybridge.

llvm-svn: 182102
2013-05-17 14:48:34 +00:00
David Majnemer 66fb70de38 Remove a recently redundant transform from X86ISelLowering.
X86ISelLowering has support to treat:
(icmp ne (and (xor %flags, -1), (shl 1, flag)), 0)

as if it were actually:
(icmp eq (and %flags, (shl 1, flag)), 0)

However, r179386 has code at the InstCombine level to handle this.

llvm-svn: 181145
2013-05-05 02:00:10 +00:00
Nadav Rotem 42932bdcd0 Fix an odd comment.
llvm-svn: 181136
2013-05-04 23:24:56 +00:00
Michael Liao 06badde1ac 80-col fixup.
llvm-svn: 180915
2013-05-02 09:22:04 +00:00
Michael Liao afafa98fa8 Avoid duplicating logic on frame register selecting when lowering eh_return
No functionality change

llvm-svn: 180914
2013-05-02 09:18:38 +00:00
Michael Liao 31d39a4a47 Avoid duplicating logic on frame register selecting when lowering frameaddr
No functionality change

llvm-svn: 180912
2013-05-02 08:21:56 +00:00
Tim Northover 16aba17024 Remove unused ShouldFoldAtomicFences flag.
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.

llvm-svn: 179940
2013-04-20 12:32:43 +00:00
Tim Northover a2b533906a Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.
llvm-svn: 179939
2013-04-20 12:32:17 +00:00
Michael Liao b53d8963ce ArrayRefize getMachineNode(). No functionality change.
llvm-svn: 179901
2013-04-19 22:22:57 +00:00
Michael Liao e28fab22c4 Use 'array_lengthof' as possible to avoid magic numbers
llvm-svn: 179833
2013-04-19 04:03:37 +00:00
Benjamin Kramer c557828805 X86: Add an SSE2 lowering for 64 bit compares when pcmpgtq (SSE4.2) isn't available.
This pattern started popping up in vectorized min/max reductions.

llvm-svn: 179797
2013-04-18 21:37:45 +00:00
Michael Liao 55658d4222 Optimize vector select from all 0s or all 1s
As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.

llvm-svn: 179267
2013-04-11 05:15:54 +00:00
Michael Liao f7bf87051a Enhance bool simplifcation in X86 to handle more cases
This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND

As a by-product, PR5443 is also fixed.

llvm-svn: 179265
2013-04-11 04:43:09 +00:00
Evan Cheng ac0469c5d0 __sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in
xmm0 / xmm1.

rdar://13599493

llvm-svn: 179141
2013-04-10 01:26:07 +00:00
Bill Wendling eb108bad50 Use the target options specified on a function to reset the back-end.
During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.

llvm-svn: 178917
2013-04-05 21:52:40 +00:00
Benjamin Kramer b60633fb87 X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available.
A vector sext + sitofp is a lot cheaper than 8 scalar conversions.

llvm-svn: 178448
2013-03-31 12:49:15 +00:00
Benjamin Kramer 70671b9937 Remove the old CodePlacementOpt pass.
It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349
2013-03-29 17:14:24 +00:00
Michael Liao a486a11dcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Michael Liao 5fff5c7b26 Enhance boolean simplification to handle 16-/64-bit RDRAND
- RDRAND always clears the destination value when a random value is not
  available (i.e. CF == 0). This value is truncated or zero-extended as
  the false boolean value to be returned. Boolean simplification needs
  to skip this 'zext' or 'trunc' node.

llvm-svn: 178312
2013-03-28 23:38:52 +00:00
Michael Liao 96b42608ab Skip moving call address loading into callseq when targets prefer register indirect call.
To enable a load of a call address to be folded with that call, this
load is moved from outside of callseq into callseq. Such a moving
adds a non-glued node (that load) into a glued sequence. This non-glue
load is only removed when DAG selection folds them into a memory form
call instruction. When such instruction selection is disabled, it breaks
DAG schedule.

To prevent that, such moving is disabled when target favors register
indirect call.

Previous workaround disabling CALL32m/CALL64m insn selection is removed.

llvm-svn: 178308
2013-03-28 23:13:21 +00:00
Timur Iskhodzhanov a2fd5fdd7a Make Win32 put the SRet address into EAX, fixes PR15556
llvm-svn: 178291
2013-03-28 21:30:04 +00:00
Preston Gurd 663e6f9558 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Hal Finkel 1996f3d87f Fix typo (common to both X86 and PPC)
Thanks to Bill Schmidt for pointing this out during code review!

llvm-svn: 178170
2013-03-27 19:10:42 +00:00
Michael Liao 03f9ad0e67 Add XTEST codegen support
llvm-svn: 178083
2013-03-26 22:47:01 +00:00
Michael Liao 5fbcd81793 Revise alignment checking/calculation on 256-bit unaligned memory access
- It's still considered aligned when the specified alignment is larger
  than the natural alignment;
- The new alignment for the high 128-bit vector should be min(16,
  alignment) as the pointer is advanced by 16, a power-of-2 offset.

llvm-svn: 177947
2013-03-25 23:50:10 +00:00
Michael Liao 0f4ea0c4a9 Fix PR15296
- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
  to support extended 256-bit integer in AVX but not AVX2.

llvm-svn: 177478
2013-03-20 02:33:21 +00:00
Michael Liao 5a4e81d2e8 Mark all variable shifts needing customizing
- Prepare moving logic from DAG combining into DAG lowering. There's no
  functionality change.

llvm-svn: 177477
2013-03-20 02:28:20 +00:00
Michael Liao 48e8a3727c Move scalar immediate shift lowering into a dedicated func
- no functionality change

llvm-svn: 177476
2013-03-20 02:20:36 +00:00
Nadav Rotem 0f1bc60d51 Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>

llvm-svn: 177421
2013-03-19 18:38:27 +00:00
Anton Korobeynikov 3e7005f1c1 TLS support for MinGW targets.
MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available.

Patch by David Nadlinger!

llvm-svn: 177257
2013-03-18 08:12:28 +00:00
Michael Liao 20d287044c Fix PR15309
- Fix the typo on type checking

llvm-svn: 177010
2013-03-14 06:57:42 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Benjamin Kramer 2c3d0df8ee X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR.
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634
2013-03-07 18:48:40 +00:00
Michael Liao d5cac37dc5 Fix two remaining issue after fixing PR15355 when CMOV is not available
- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598
2013-03-07 01:01:29 +00:00
Michael Liao da22b30be5 Fix PR15355
- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538
2013-03-06 00:17:04 +00:00
Preston Gurd 485296d1e8 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Michael Liao 609a527286 Refine fix to PR10499, no functionality change
- Put expensive checking after simple one

llvm-svn: 176060
2013-02-25 23:16:36 +00:00
Michael Liao ab97668061 Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Nadav Rotem b532fca92c Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Jim Grosbach 341ad3e72a Update TargetLowering ivars for name policy.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

ivars should be camel-case and start with an upper-case letter. A few in
TargetLowering were starting with a lower-case letter.

No functional change intended.

llvm-svn: 175667
2013-02-20 21:13:59 +00:00
Elena Demikhovsky 0ccdd1315b I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619
2013-02-20 12:42:54 +00:00
Craig Topper f371e89264 Fix capitalization in comment to match function name.
llvm-svn: 175497
2013-02-19 07:43:59 +00:00
Jakub Staszak 1f199a0ef2 Use array_pod_sort instead of std::sort.
llvm-svn: 175472
2013-02-18 23:18:22 +00:00
Jakub Staszak d784d96074 Minor cleanups. No functionality change.
llvm-svn: 175359
2013-02-16 13:34:26 +00:00
Nadav Rotem accb0c747c 80-col
llvm-svn: 175189
2013-02-14 18:20:48 +00:00
Elena Demikhovsky d0a0cc80cd Fixed a bug in X86TargetLowering::LowerVectorIntExtend() (assertion failure).
Added a test.

llvm-svn: 175144
2013-02-14 08:20:26 +00:00
Nick Lewycky beba972659 Don't build tail calls to functions with three inreg arguments on x86-32 PIC.
Fixes PR15250!

llvm-svn: 175092
2013-02-13 21:59:15 +00:00
Eric Christopher 389ee71b0a Check i1 as well as i8 variables for 8 bit registers for x86 inline
assembly.

llvm-svn: 175036
2013-02-13 06:01:05 +00:00
Jakob Stoklund Olesen dc69f6fbca Move MRI liveouts to X86 return instructions.
llvm-svn: 174402
2013-02-05 17:59:48 +00:00
Benjamin Kramer 2c9da989c2 X86: Open up some opportunities for constant folding by postponing shift lowering.
Fixes PR15141.

llvm-svn: 174327
2013-02-04 15:19:33 +00:00
Benjamin Kramer 0611298446 X86: Simplify code. No functionality change.
llvm-svn: 174326
2013-02-04 15:19:25 +00:00
Eric Christopher 258c867c0b Whitespace.
llvm-svn: 174009
2013-01-31 00:50:48 +00:00
Eric Christopher 4e3e94c13d Check and allow floating point registers to select the size of the
register for inline asm. This conforms to how gcc allows for effective
casting of inputs into gprs (fprs is already handled).

llvm-svn: 174008
2013-01-31 00:50:46 +00:00
Evan Cheng d2ca4e2ed9 Restrict sin/cos optimization to 64-bit only for now. 32-bit is a bit messy and less critical.
llvm-svn: 173987
2013-01-30 22:56:35 +00:00
Evan Cheng 27e41c9f70 Remove dead code.
llvm-svn: 173812
2013-01-29 18:08:22 +00:00
Evan Cheng 0e88c7d897 Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Craig Topper 8fb09f0abb Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction.
llvm-svn: 173667
2013-01-28 06:48:25 +00:00
Benjamin Kramer 6a93596538 X86: Decode PALIGN operands so I don't have to do it in my head.
llvm-svn: 173572
2013-01-26 13:31:37 +00:00
Benjamin Kramer 99c68dd964 X86: Do splat promotion later, so the optimizer can chew on it first.
This catches many cases where we can emit a more efficient shuffle for a
specific mask or when the mask contains undefs. Once the splat is lowered to
unpacks we can't do that anymore.

There is a possibility of moving the promotion after pshufb matching, but I'm
not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so
I avoided that for now.

llvm-svn: 173569
2013-01-26 11:44:21 +00:00
Eli Bendersky 597fc1233a In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Michael Liao 3dffc5e2b7 Fix an issue of pseudo atomic instruction DAG schedule
- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200
2013-01-22 21:47:38 +00:00
Tim Northover 29178a348a Make APFloat constructor require explicit semantics.
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.

llvm-svn: 173138
2013-01-22 09:46:31 +00:00
Craig Topper 66163a35ee Use <0 checks in place of ==-1 because it results in simpler code.
llvm-svn: 173010
2013-01-21 07:25:16 +00:00
Craig Topper 9b29486f42 Use MVT instead of EVT in LowerVECTOR_SHUFFLEtoBlend.
llvm-svn: 173009
2013-01-21 07:19:54 +00:00
Craig Topper 32c5406dcf Remove trailing whitespace.
llvm-svn: 173008
2013-01-21 06:57:59 +00:00
Craig Topper 5c84c25bf4 Fix some 80 column violations.
llvm-svn: 173006
2013-01-21 06:21:54 +00:00
Craig Topper 2cd375896a Make helper method static.
llvm-svn: 173005
2013-01-21 06:13:28 +00:00
Craig Topper cf93977920 Convert more EVT's to MVT's in the lowering methods.
llvm-svn: 172995
2013-01-20 21:50:27 +00:00
Craig Topper e65a08be64 Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards.
llvm-svn: 172994
2013-01-20 21:34:37 +00:00
Craig Topper ce61fdf0a3 Make LowerVSETCC a static function and use MVT instead of EVT.
llvm-svn: 172969
2013-01-20 09:02:22 +00:00
Nadav Rotem 9450fcfff1 Revert 172708.
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.

llvm-svn: 172968
2013-01-20 08:35:56 +00:00
Craig Topper 9976974cc6 Make some helper methods static.
llvm-svn: 172936
2013-01-20 00:50:58 +00:00
Craig Topper 4ac87da529 Remove DebugLoc argument from static function. It can easily be obtained from the SVOp passed in.
llvm-svn: 172935
2013-01-20 00:43:42 +00:00
Craig Topper 3da6507c41 Use MVT instead of EVT in more instruction lowering code.
llvm-svn: 172933
2013-01-20 00:38:18 +00:00
Craig Topper 53c7fbabbf Use MVT instead of EVT in more of the shuffle lowering code.
llvm-svn: 172930
2013-01-19 23:36:09 +00:00
Craig Topper bb772d27a7 Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file.
llvm-svn: 172927
2013-01-19 23:14:09 +00:00
Nadav Rotem 7b3120b9ae On Sandybridge split unaligned 256bit stores into two xmm-sized stores.
llvm-svn: 172894
2013-01-19 08:38:41 +00:00
Craig Topper 84b01120bc Use MVT instead of EVT when computing shuffle immediates since they can only be for legal types. Keeps compiler from generating unneeded checks and handling for extended types.
llvm-svn: 172893
2013-01-19 08:27:45 +00:00
Nadav Rotem 7431211214 On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction.
llvm-svn: 172868
2013-01-18 23:10:30 +00:00
Craig Topper 1cb8aa581b Calculate vector element size more directly for VINSERTF128/VEXTRACTF128 immediate handling. Also use MVT since this only called on legal types during pattern matching.
llvm-svn: 172797
2013-01-18 08:41:28 +00:00