Commit Graph

4271 Commits

Author SHA1 Message Date
Craig Topper ae21810ce4 [X86] Pre-allocate a SmallVector instead of using push_back in a loop. NFC
llvm-svn: 273114
2016-06-19 15:37:33 +00:00
Craig Topper 4181c03c54 [X86] Use SmallVector::assign instead of resize to ensure we really start with a vector of all -1s. Otherwise we're trusting the caller to pass the right thing.
This should be no functional change with current code.

llvm-svn: 273113
2016-06-19 15:37:30 +00:00
Craig Topper 1f083543c9 [X86] Pre-size several SmallVectors instead of calling push_back in a loop. NFC
llvm-svn: 272997
2016-06-17 12:20:50 +00:00
Craig Topper 07984f2068 [X86] Fix formatting. NFC
llvm-svn: 272996
2016-06-17 12:20:48 +00:00
Rafael Espindola 498b9e06c8 Refactor more duplicated code.
llvm-svn: 272939
2016-06-16 19:30:55 +00:00
Rafael Espindola ed44cf6ccd Refactor duplicated code.
llvm-svn: 272936
2016-06-16 18:50:12 +00:00
Craig Topper 97b1fc92e8 [X86] Pre-size some SmallVectors using the constructor in the shuffle lowering code instead of using push_back. Some of these already did this but used resize or assign instead of the constructor. NFC
llvm-svn: 272872
2016-06-16 03:58:45 +00:00
Craig Topper 66f1a8b608 [X86] Remove else after return. NFC
llvm-svn: 272871
2016-06-16 03:58:42 +00:00
Craig Topper ceda65bdc4 [X86] Inline a couple lambdas into their callers since they are only used once and it all fits on a single line. NFC
llvm-svn: 272869
2016-06-16 03:11:00 +00:00
Sanjay Patel 1a4569df54 [x86] add folds for x86 vector compare nodes (PR27924)
Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened 
with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend
too while we're working on that and as a backstop.

This fixes:
https://llvm.org/bugs/show_bug.cgi?id=27924

Differential Revision: http://reviews.llvm.org/D21356

llvm-svn: 272828
2016-06-15 20:26:58 +00:00
Kevin B. Smith acbda9ef30 [X86]: Updated r272801 to promote 16 bit compares with immediate operand
to 32 bits. This is in response to a comment by Eli Friedman.

llvm-svn: 272814
2016-06-15 18:18:05 +00:00
Sanjay Patel 30e0456562 [x86] fix function name; NFC
llvm-svn: 272805
2016-06-15 17:12:29 +00:00
Kevin B. Smith 54566a0e9a [X86]: Quit promoting 8 and 16 bit compares to 32 bit.
Differential Revision: http://reviews.llvm.org/D21144

llvm-svn: 272801
2016-06-15 16:37:46 +00:00
Wei Mi b799a625f9 [X86] Reduce the width of multiplification when its operands are extended from i8 or i16
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.

Differential Revision: http://reviews.llvm.org/D20931

llvm-svn: 272694
2016-06-14 18:53:20 +00:00
Haojian Wu 7900ca1e7e Fix an enumeral mismatch warning.
Summary:
The "-Werror=enum-compare" shows that the statement is using two different enums:

enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'

A follow-up fix on D21235.

Reviewers: klimek

Subscribers: spatel, cfe-commits

Differential Revision: http://reviews.llvm.org/D21278

llvm-svn: 272539
2016-06-13 09:03:45 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Simon Pilgrim 5b9bade8dd [X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.

llvm-svn: 272477
2016-06-11 15:44:13 +00:00
Craig Topper 504fba5c8a [AVX512] Lower v8i64 and v16i32 to pshufd when possible.
llvm-svn: 272473
2016-06-11 13:43:21 +00:00
Simon Pilgrim 6800a45790 [X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining

llvm-svn: 272471
2016-06-11 13:38:28 +00:00
Simon Pilgrim 255fdd0666 [X86][SSE] Use vXi8 return type for PSLLDQ/PSRLDQ instructions
These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements

llvm-svn: 272468
2016-06-11 12:54:37 +00:00
Craig Topper 40abd1cc61 [AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW.
llvm-svn: 272450
2016-06-11 03:27:42 +00:00
Craig Topper b9b86fcfff [AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled.
llvm-svn: 272449
2016-06-11 03:27:37 +00:00
Sanjoy Das 39c226fdba [STLExtras] Introduce and use llvm::count_if; NFC
(This is split out from was D21115)

llvm-svn: 272435
2016-06-10 21:18:39 +00:00
Simon Pilgrim 47c76e201a [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR
llvm-svn: 272307
2016-06-09 20:53:12 +00:00
Simon Pilgrim 0ab9d3026a [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget

llvm-svn: 272300
2016-06-09 20:13:58 +00:00
Igor Breger f635367e2b [AVX512] Remove masked_move/blendm intrinsic from back-end.
This is complement patch to D21060.

Differential Revision: http://reviews.llvm.org/D21174

llvm-svn: 272257
2016-06-09 11:46:55 +00:00
Craig Topper 565a5b5451 [X86] Fix bad comment in assert. NFC
llvm-svn: 272248
2016-06-09 07:06:33 +00:00
Benjamin Kramer 46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Etienne Bergeron 22bfa83208 [stack-protection] Add support for MSVC buffer security check
Summary:
This patch is adding support for the MSVC buffer security check implementation

The buffer security check is turned on with the '/GS' compiler switch.
  * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
  * To be added to clang here: http://reviews.llvm.org/D20347

Some overview of buffer security check feature and implementation:
  * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
  * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
  * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html


For the following example:
```
int example(int offset, int index) {
  char buffer[10];
  memset(buffer, 0xCC, index);
  return buffer[index];
}
```

The MSVC compiler is adding these instructions to perform stack integrity check:
```
        push        ebp  
        mov         ebp,esp  
        sub         esp,50h  
  [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
  [2]   xor         eax,ebp  
  [3]   mov         dword ptr [ebp-4],eax  
        push        ebx  
        push        esi  
        push        edi  
        mov         eax,dword ptr [index]  
        push        eax  
        push        0CCh  
        lea         ecx,[buffer]  
        push        ecx  
        call        _memset (010610B9h)  
        add         esp,0Ch  
        mov         eax,dword ptr [index]  
        movsx       eax,byte ptr buffer[eax]  
        pop         edi  
        pop         esi  
        pop         ebx  
  [4]   mov         ecx,dword ptr [ebp-4]  
  [5]   xor         ecx,ebp  
  [6]   call        @__security_check_cookie@4 (01061276h)  
        mov         esp,ebp  
        pop         ebp  
        ret  
```

The instrumentation above is:
  * [1] is loading the global security canary,
  * [3] is storing the local computed ([2]) canary to the guard slot,
  * [4] is loading the guard slot and ([5]) re-compute the global canary,
  * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.

Overview of the current stack-protection implementation:
  * lib/CodeGen/StackProtector.cpp
    * There is a default stack-protection implementation applied on intermediate representation.
    * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
    * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
    * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
    * Guard manipulation and comparison are added directly to the intermediate representation.

  * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
      * see long comment above 'class StackProtectorDescriptor' declaration.
    * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
      * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
    * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.

  * include/llvm/Target/TargetLowering.h
    * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.

  * lib/Target/X86/X86ISelLowering.cpp
    * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.

Function-based Instrumentation:
  * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
  * To support function-based instrumentation, this patch is
    * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
      * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
    * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
    * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
    * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).

Modifications
  * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
  * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)

Results

  * IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```

```
*** Final LLVM Code input to ISel ***

; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
  %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
  %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
  call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
  %index.addr = alloca i32, align 4
  %offset.addr = alloca i32, align 4
  %buffer = alloca [10 x i8], align 1
  store i32 %index, i32* %index.addr, align 4
  store i32 %offset, i32* %offset.addr, align 4
  %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
  %1 = load i32, i32* %index.addr, align 4
  call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
  %2 = load i32, i32* %index.addr, align 4
  %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
  %3 = load i8, i8* %arrayidx, align 1
  %conv = sext i8 %3 to i32
  %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
  call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
  ret i32 %conv
}
```

  * SelectionDAG generated instrumentation:

```
clang-cl /GS test.cc /O1 /c /FA
```

```
"?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
# BB#0:                                 # %entry
        pushl   %esi
        subl    $16, %esp
        movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
        movl    28(%esp), %esi
        movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
        leal    2(%esp), %eax
        pushl   %esi
        pushl   $204
        pushl   %eax
        calll   _memset
        addl    $12, %esp
        movsbl  2(%esp,%esi), %esi
        movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
        calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
        movl    %esi, %eax
        addl    $16, %esp
        popl    %esi
        retl
```

Reviewers: kcc, pcc, eugenis, rnk

Subscribers: majnemer, llvm-commits, hans, thakis, rnk

Differential Revision: http://reviews.llvm.org/D20346

llvm-svn: 272053
2016-06-07 20:15:35 +00:00
Simon Pilgrim ca1da1bf07 [X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.

This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.

llvm-svn: 272003
2016-06-07 12:20:14 +00:00
Igor Breger edafb0595e [KNL] Fix UMULO lowering.
Differential Revision: http://reviews.llvm.org/D21013

llvm-svn: 271891
2016-06-06 12:24:52 +00:00
Craig Topper 143446d5c1 [AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32.
llvm-svn: 271870
2016-06-06 05:39:10 +00:00
Simon Pilgrim 64c6de4525 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines
llvm-svn: 271834
2016-06-05 15:21:30 +00:00
Simon Pilgrim 478295dadd [X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type
llvm-svn: 271831
2016-06-05 15:01:45 +00:00
Craig Topper 8eeda57a40 [AVX512] Add support for lowering PALIGNR for v64i8.
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.

llvm-svn: 271828
2016-06-05 06:29:12 +00:00
Craig Topper 9f51c9ef15 [AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled.
v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled.

llvm-svn: 271826
2016-06-05 05:35:11 +00:00
Craig Topper e609bd6600 [X86] Add the VR128L/H and VR256L/H to the list of vector register classes for inline asm constraints. Also fix the comment on the function.
llvm-svn: 271802
2016-06-04 20:15:08 +00:00
Saleem Abdulrasool 1fcdc23a6e X86: enable TLS on Windows itanium
Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for
C++).  Enable the TLS support for the target similar to the MSVC model.

llvm-svn: 271797
2016-06-04 18:27:22 +00:00
Simon Pilgrim fd2eda4f64 [X86][AVX2] Fix v16i16 SHL lowering (PR27730)
The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result.

The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits.

llvm-svn: 271796
2016-06-04 16:45:33 +00:00
Simon Pilgrim e85506b6e0 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

llvm-svn: 271633
2016-06-03 08:06:03 +00:00
Dimitry Andric 6a482a73d6 Only attempt to detect AVG if SSE2 is available
Summary:
In PR29973 Sanjay Patel reported an assertion failure when a certain
loop was optimized, for a target without SSE2 support.  It turned out
this was because of the AVG pattern detection introduced in rL253952.

Prevent the assertion failure by bailing out early in
`detectAVGPattern()`, if the target does not support SSE2.

Also add a minimized test case.

Reviewers: congh, eli.friedman, spatel

Subscribers: emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D20905

llvm-svn: 271548
2016-06-02 17:30:49 +00:00
Craig Topper 5bb9cda620 [AVX512] Remove LOADA/LOADU/STOREA/STOREU intrinsic types now that they are unused.
llvm-svn: 271479
2016-06-02 04:19:40 +00:00
Yaron Keren a34bfa000f Do not modify a std::vector while looping it.
Introduced in r271244, this is probably undefined behaviour and asserts when
compiled with Visual C++ debug mode. 

On further note, the loop is quadratic with regard to the number of successors
since removeSuccessor is linear and could probably be modified to linear time.

llvm-svn: 271278
2016-05-31 13:45:05 +00:00
Igor Breger 73ee8ba9b0 [AVX512] Fix intrinsic vcvtps2ph lowering.
Differential Revision: http://reviews.llvm.org/D20788

llvm-svn: 271255
2016-05-31 08:04:21 +00:00
Igor Breger 52bd1d5fcc Fix intrinsic vbroadcast{i32|f32}x2 lowering.
Differential Revision: http://reviews.llvm.org/D20780

llvm-svn: 271254
2016-05-31 07:43:39 +00:00
Saleem Abdulrasool d2f705ddf9 X86: permit using SjLj EH on x86 targets as an option
This adds support to the backed to actually support SjLj EH as an exception
model.  This is *NOT* the default model, and requires explicitly opting into it
from the frontend.  GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.

Addresses PR27749!

llvm-svn: 271244
2016-05-31 01:48:07 +00:00
Ahmed Bougacha a3dc1ba142 [X86] Try to zero elts when lowering 256-bit shuffle with PSHUFB.
Otherwise we fallback to a blend of PSHUFBs later on.

Differential Revision: http://reviews.llvm.org/D19661

llvm-svn: 271113
2016-05-28 14:38:04 +00:00
Michael Kuperstein a75c77b127 [X86] Detect SAD patterns and emit psadbw instructions.
This recommits r267649 with a fix for PR27539.

Differential Revision: http://reviews.llvm.org/D20598

llvm-svn: 271033
2016-05-27 18:53:22 +00:00
Ahmed Bougacha 346b98011c [X86] Clarify PSHUFB+blend lowering function name. NFC.
Also guard against v32i8 users.

llvm-svn: 271024
2016-05-27 17:58:17 +00:00
Simon Pilgrim cf340bd9c1 [X86][SSE] When lowering a 256-bit shuffle as PMOVZX, reduce the input vector to the lower 128-bit subvector.
Most often as not this is what it started out as, the extraction is zero-cost on AVX and the PMOVZX/PMOVSX folding logic is based around 128-bit loads.

llvm-svn: 270858
2016-05-26 15:40:36 +00:00
Igor Breger 8437bb70fd [AVX512] Fix intrinsic cmp{sd|ss} lowering.
Differential Revision: http://reviews.llvm.org/D20615

llvm-svn: 270843
2016-05-26 12:42:25 +00:00
Simon Pilgrim 4810683ddf Simplify std::all_of/any_of predicates by using llvm::all_of/any_of. NFCI.
llvm-svn: 270753
2016-05-25 20:41:11 +00:00
Igor Breger 23c2090606 [llvm][AVX512][intrinsics] Fix vperm{b|w|d|q|ps|pd} intrinsics. Index is second argument to buildin function but it is first instruction operand.
Differential Revision: http://reviews.llvm.org/D20515

llvm-svn: 270548
2016-05-24 11:06:22 +00:00
Igor Breger 2ba64ab9ae [AVX512] Implement missing patterns for any_extend load lowering.
Differential Revision: http://reviews.llvm.org/D20513

llvm-svn: 270357
2016-05-22 10:21:04 +00:00
Michael Zuckerman a63a129749 [Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics.
Differential Revision: http://reviews.llvm.org/D20438

llvm-svn: 270322
2016-05-21 14:44:18 +00:00
Michael Zuckerman 11b55b29d1 [Clang][AVX512][intrinsics] Fix vscalef intrinsics.
Differential Revision: http://reviews.llvm.org/D20324

llvm-svn: 270321
2016-05-21 11:09:53 +00:00
Simon Pilgrim 55ef3da27b [X86][AVX] Generalized matching for target shuffle combines
This patch is a first step towards a more extendible method of matching combined target shuffle masks.

Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.

I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.

Differential Revision: http://reviews.llvm.org/D19198

llvm-svn: 270230
2016-05-20 16:19:30 +00:00
Rafael Espindola c7e9813228 Refactor X86 symbol access classification.
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.

The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.

llvm-svn: 270209
2016-05-20 12:20:10 +00:00
Rafael Espindola ab03eb007c Record a TargetMachine instead of a Reloc::Model.
Addresses r270095's code review.

llvm-svn: 270147
2016-05-19 22:07:57 +00:00
Hans Wennborg 172eee9cfc X86: Don't reset the stack after calls that don't return (PR27117)
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.

Differential Revision: http://reviews.llvm.org/D20406

llvm-svn: 270109
2016-05-19 20:15:33 +00:00
Rafael Espindola 46107b9e62 Remember the relocation model. NFC.
This avoids passing a TargetMachine in a few places.

llvm-svn: 270095
2016-05-19 18:49:29 +00:00
Rafael Espindola cb2d266360 Style fixes. NFC.
llvm-svn: 270093
2016-05-19 18:34:20 +00:00
Hans Wennborg 8eb336c14e Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.

llvm-svn: 269949
2016-05-18 16:10:17 +00:00
Ashutosh Nema 348af9cc6b Add new flag and intrinsic support for MWAITX and MONITORX instructions
Summary:

MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.

The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction execution
and enter an implementation-dependent optimized state until occurrence of a
class of events.

Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.

These instructions are enabled for AMD's bdver4 architecture.

Patch by Ganesh Gopalasubramanian!

Reviewers: echristo, craig.topper, RKSimon
Subscribers: RKSimon, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19795

llvm-svn: 269911
2016-05-18 11:59:12 +00:00
Hans Wennborg 759af30109 Revert r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"
Seems to have broken the Windows ASan bot. Reverting while investigating.

llvm-svn: 269833
2016-05-17 20:38:56 +00:00
Hans Wennborg c3fb51171e X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions
This patch moves the expansion of WIN_ALLOCA pseudo-instructions
into a separate pass that walks the CFG and lowers the instructions
based on a conservative estimate of the offset between the stack
pointer and the lowest accessed stack address.

The goal is to reduce binary size and run-time costs by removing
calls to _chkstk. While it doesn't fix all the code quality problems
with inalloca calls, it's an incremental improvement for PR27076.

Differential Revision: http://reviews.llvm.org/D20263

llvm-svn: 269828
2016-05-17 20:13:29 +00:00
Michael Kuperstein ac2088d122 [X86] Remove transformVSELECTtoBlendVECTOR_SHUFFLE
The new X86 shuffle lowering can do just fine without transforming vselects
into vector_shuffles. It looks like the only thing this code does right now
is cause trouble - in particular, it can lead to combine/legalization infinite
loops.

Note that it's not completely NFC, since some of the shuffle masks get inverted,
which may cause slight differences further down the line. We may want to find
a way to invert those masks, but that's orthogonal to this commit.

This fixes the hang in PR27689.

llvm-svn: 269676
2016-05-16 18:27:00 +00:00
Simon Pilgrim 0d05484db6 Fixed unused variable warning
llvm-svn: 269650
2016-05-16 11:48:54 +00:00
Simon Pilgrim 265995ef53 [X86][SSSE3] Lower vector CTLZ with PSHUFB lookups
This patch uses PSHUFB to lower vector CTLZ and avoid (slower) scalarizations.

The leading zero count of each 4-bit nibble of the vector is determined by using a PSHUFB lookup. Pairs of results are then repeatedly combined up to the original element width.

Differential Revision: http://reviews.llvm.org/D20016

llvm-svn: 269646
2016-05-16 11:19:11 +00:00
Simon Pilgrim 2d4bf1042b [X86][SSE] Simplify zero'th index extract element matching
llvm-svn: 269615
2016-05-15 20:22:50 +00:00
Simon Pilgrim fbe97bc15a [X86][SSE] Removed duplicate variables. NFCI.
Removed duplicate getOperand / getSimpleValueType calls.

llvm-svn: 269614
2016-05-15 20:11:10 +00:00
Elena Demikhovsky e79b716daf Fixed lowering of _comi_ intrinsics from all sets - SSE/SSE2/AVX/AVX-512
Differential revision http://reviews.llvm.org/D19261

llvm-svn: 269569
2016-05-14 15:06:09 +00:00
Quentin Colombet b47b9b2de7 [X86] Strengthen the setting of inline asm constraints for fp regclasses.
This is similar to r268953, but for floating point and vector register
classes.

Explanations:
The setting of the inline asm constraints was implicitly relying on the
order of the register classes in the file generated by tablegen.
Since, we do not have any control on that order, make sure we do not
depend on it anymore.

llvm-svn: 268973
2016-05-09 21:24:31 +00:00
Quentin Colombet 86098ab10b Reapply [X86] Add a new LOW32_ADDR_ACCESS_RBP register class.
This reapplies commit r268796, with a fix for the setting of the inline asm
constraints. I.e., "mark" LOW32_ADDR_ACCESS_RBP as a GR variant, so that the
regular processing of the GR operands (setting of the subregisters) happens.

Original commit log:
[X86] Add a new LOW32_ADDR_ACCESS_RBP register class.

ABIs like NaCl uses 32-bit addresses but have 64-bit frame.
The new register class reflects those constraints when choosing a
register class for a address access.

llvm-svn: 268955
2016-05-09 19:01:46 +00:00
Quentin Colombet bb15ce3d1f [X86] Strengthen the setting of inline asm constraints.
The setting of the inline asm constraints was implicitly relying on the
order of the register classes in the file generated by tablegen.
Since, we do not have any control on that order, make sure we do not
depend on it anymore.

llvm-svn: 268953
2016-05-09 19:01:35 +00:00
David Majnemer eac58d8f68 [X86] Promote several single precision FP libcalls on Windows
A number of libcalls don't exist in any particular lib but are, instead,
defined in math.h as inline functions (even in C mode!).  Don't rely on
their existence when lowering @llvm.{cos,sin,floor,..}.f32, promote them
instead.

N.B. We had logic to handle FREM but were missing out on a number of
others.  This change generalizes the FREM handling.

llvm-svn: 268875
2016-05-08 08:15:50 +00:00
Craig Topper d681e23336 [X86] Lower 256-bit vector all-zero constants to v8i32 even with AVX1 only. Either way a 256-bit VXORPS will be used.
llvm-svn: 268873
2016-05-08 07:10:54 +00:00
Simon Pilgrim 96e5307d4e [X86] Pulled out duplicate mask width calculation. NFCI.
llvm-svn: 268861
2016-05-07 18:04:24 +00:00
Sanjay Patel c2751e7050 [x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons
For the sake of minimalism, this patch is x86 only, but I think that at least
PPC, ARM, AArch64, and Sparc probably want to do this too.

We might want to generalize the hook and pattern recognition for a target like
PPC that has a full assortment of negated logic ops (orc, nand).

Note that http://reviews.llvm.org/D18842 will cause this transform to trigger
more often.

For reference, this relates to:
https://llvm.org/bugs/show_bug.cgi?id=27105
https://llvm.org/bugs/show_bug.cgi?id=27202
https://llvm.org/bugs/show_bug.cgi?id=27203
https://llvm.org/bugs/show_bug.cgi?id=27328

Differential Revision: http://reviews.llvm.org/D19087

llvm-svn: 268858
2016-05-07 15:03:40 +00:00
Marcin Koscielnicki 0275fac2c9 [X86] Extend some Linux special cases to cover kFreeBSD.
Both Linux and kFreeBSD use glibc, so follow similiar code paths.
Add isTargetGlibc to check for this, and use it instead of isTargetLinux
in a few places.

Fixes PR22248 for kFreeBSD.

Differential Revision: http://reviews.llvm.org/D19104

llvm-svn: 268624
2016-05-05 11:35:51 +00:00
David Majnemer 911d0e3c21 [X86] Use the right type when folding xor (truncate (shift)) -> setcc
The result type of setcc is dependent on whether or not AVX512 is
present.
We had an X86-specific DAG-combine which assumed that the result type
should be i8 when it could be i1.
This meant that we would generate illegal setccs which LowerSETCC did
not like.

Instead, use an appropriate type and zero extend to i8.

Also, there were some scenarios where the fold should have fired but
didn't because we were overly cautious about the types.  This meant that
we generated:

        shrl    $31, %edi
        andl    $1, %edi
        kmovw   %edi, %k0
        kxnorw  %k0, %k0, %k1
        kshiftrw        $15, %k1, %k1
        kxorw   %k1, %k0, %k0
        kmovw   %k0, %eax

instead of:

        testl   %edi, %edi
        setns   %al

This fixes PR27638.

llvm-svn: 268609
2016-05-05 06:00:56 +00:00
Simon Pilgrim be439d7f1a [X86] Tidied up SDValue's SDNode referencing. NFCI.
llvm-svn: 268445
2016-05-03 21:44:45 +00:00
Simon Pilgrim d2752708a3 [X86][SSE] Added target shuffle combine to MOVQ
llvm-svn: 268391
2016-05-03 15:05:13 +00:00
Igor Breger ab076c683c [AVX512] Fix lowerV4X128VectorShuffle to select correctly input operands .
Differential Revision: http://reviews.llvm.org/D19803

llvm-svn: 268368
2016-05-03 08:08:44 +00:00
Simon Pilgrim 52f8693263 [X86][SSE] Added placeholder for 128/256-bit wide shuffle combines
Begun adding placeholder for future support for vperm2f128/vshuff64x2 style 128/256-bit wide shuffles

llvm-svn: 268306
2016-05-02 21:12:48 +00:00
Simon Pilgrim e5e04baf95 [X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN lowering
movmsk.ll tests are unchanged.

llvm-svn: 268237
2016-05-02 14:58:22 +00:00
Craig Topper 33772c5375 [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior.
llvm-svn: 267853
2016-04-28 03:34:31 +00:00
Quentin Colombet d6dbec4c6f [X86] Fix the lowering of TLS calls.
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.

llvm-svn: 267797
2016-04-27 21:37:37 +00:00
Kevin B. Smith c378a99ba5 [X86]: Quit promoting 16 bit loads to 32 bit.
Differential Revision: http://reviews.llvm.org/D19592

llvm-svn: 267773
2016-04-27 19:58:03 +00:00
Nico Weber e69b9548b8 Revert r267649, it caused PR27539.
llvm-svn: 267723
2016-04-27 15:16:54 +00:00
Ahmed Bougacha 19a2ee591a [X86] Don't assume that MMX extractelts are from index 0.
It's probably the case for all 3 MMX users out there, but with
hand-crafted IR, you can trigger selection failures. Fix that.

llvm-svn: 267652
2016-04-27 01:35:29 +00:00
Ahmed Bougacha e68363a03c [X86] Re-enable MMX i32 extractelt combine.
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.

llvm-svn: 267651
2016-04-27 01:35:25 +00:00
Cong Hou 6f879d9eb1 Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched.
Differential revision: http://reviews.llvm.org/D14840

llvm-svn: 267649
2016-04-27 01:29:18 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Manman Ren 1c3f65a18c Swift Calling Convention: use %RAX for sret.
We don't need to copy the sret argument into %rax upon return.
rdar://25671494

llvm-svn: 267579
2016-04-26 18:08:06 +00:00
Craig Topper 03734c7ce1 [X86] Replace a SmallVector used to pass 2 values to an ArrayRef parameter with a fixed size array. NFC
llvm-svn: 267377
2016-04-25 04:30:29 +00:00
Simon Pilgrim dd748b83aa [X86][SSE] getTargetShuffleMaskIndices - dropped (unused) UNDEF handling
We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements.

llvm-svn: 267350
2016-04-24 16:49:53 +00:00
Simon Pilgrim 7c25ef92a3 [X86][SSE] Use range loop. NFCI.
llvm-svn: 267349
2016-04-24 16:33:35 +00:00
Simon Pilgrim 9f5697ef68 [X86][SSE] Improved support for decoding target shuffle masks through bitcasts
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.

This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.

llvm-svn: 267343
2016-04-24 14:53:54 +00:00
Craig Topper dbc981f71f [X86] Merge LowerCTLZ and LowerCTLZ_ZERO_UNDEF into a single function that branches internally for the one difference, allowing the rest of the code to be common. NFC
llvm-svn: 267331
2016-04-24 06:27:39 +00:00
Craig Topper 6469a39f51 [X86] Node need to check if AVX512 is supported when lowering vector CTLZ. The CTLZ operation is only Custom for vectors if AVX512 is enabled so if a vector gets here AVX512 is implied. NFC
llvm-svn: 267330
2016-04-24 06:27:35 +00:00
Craig Topper 59479e7208 [AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ.
llvm-svn: 267100
2016-04-22 03:22:38 +00:00
Craig Topper 21690db05a [AVX512] Add CTTZ support for v8i64 and v16i32 vectors.
llvm-svn: 266968
2016-04-21 07:30:06 +00:00
Craig Topper 340ad0a0c9 [AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions.
llvm-svn: 266963
2016-04-21 06:39:34 +00:00
Craig Topper 7dedfdc60a [X86] Remove redundant calls to setOperationAction for EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT from SSE41 block. They were already done in an earlier block. NFC
llvm-svn: 266962
2016-04-21 06:39:32 +00:00
Craig Topper 032e985cbc [X86] Remove some operations from the default Expand all vector ops loop. Instead let them stay Legal and mark them Expand for specific types where needed. Reduces overall number of calls to setOperationAction. NFC
llvm-svn: 266961
2016-04-21 06:39:29 +00:00
Craig Topper 98c855d480 [X86] Remove old leftover MMX code that sets various 64-bit vector operations to Expand. These vector types aren't legal so these operations would never make it far enough to need to expand. NFC
llvm-svn: 266960
2016-04-21 06:39:26 +00:00
Craig Topper 3e6be4c27a [X86] Remove unnecessary setting of CTTZ_ZERO_UNDEF to Custom for vector types where we can't do any better than the Custom lowering of CTTZ. LegalizeVectorOps will expand to CTTZ since its marked Custom.
CTTZ_ZERO_UNDEF can be custom lowered specially if CTLZ is supported. Otherwise CTTZ and CTTZ_ZERO_UNDEF are handled the same way by using CTPOP and bitmath.

llvm-svn: 266952
2016-04-21 04:44:00 +00:00
Craig Topper 3dd625ce79 [AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions.
Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results.

llvm-svn: 266950
2016-04-21 03:57:24 +00:00
Asaf Badouh 89406d1815 [X86] enable PIE for functions
Call locally defined function directly for PIE/fPIE

Differential Revision: http://reviews.llvm.org/D19226

llvm-svn: 266863
2016-04-20 08:32:57 +00:00
Craig Topper 99e60e9f1f [AVX512] Add popcount support for v32i16 and v64i8.
llvm-svn: 266858
2016-04-20 05:18:55 +00:00
Craig Topper 3e8f1e483c [X86] Mark some floating point operations that are always expanded for vector types as Expand in a floating point only loop instead of looping through all vector types.
llvm-svn: 266850
2016-04-20 01:57:44 +00:00
Craig Topper 7f28d55a00 [X86] Don't mark vector loads and shifts Expand in advance. Loads are always marked Legal or Promote for all the legal types later. Shifts are always marked custom. NFC
llvm-svn: 266849
2016-04-20 01:57:42 +00:00
Craig Topper ab7497dd6e [X86] Merge the two different SSE2 blocks in the X86TargetLowering constructor. Also qualfiy the XOP block with !useSoftFloat to match the other vector blocks.
llvm-svn: 266848
2016-04-20 01:57:40 +00:00
Craig Topper 397968ea16 [X86] Don't set vector FADD,FSUB,FMUL,FDIV,FNEG,FSQRT to Expand early. For every legal FP type we either set them to Legal or Custom anyway. So let them stay defaulted to Legal and only change when they need to be Custom.
llvm-svn: 266847
2016-04-20 01:57:38 +00:00
Tim Shen a1d8bc5597 [PPC, SSP] Support PowerPC Linux stack protection.
llvm-svn: 266809
2016-04-19 20:14:52 +00:00
Tim Shen e885d5e4d3 [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.

There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.

Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.

llvm-svn: 266806
2016-04-19 19:40:37 +00:00
Simon Pilgrim 32b1c9fe7f [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.

Differential Revision: http://reviews.llvm.org/D19228

llvm-svn: 266728
2016-04-19 12:26:40 +00:00
Craig Topper 221e1c2b1f [X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 rather than just looping over all vector types and conditinally matching them. NFC
llvm-svn: 266577
2016-04-17 22:49:46 +00:00
Simon Pilgrim dd153476fd [X86] Added TODO comment for target shuffle mask decoding of bitcasted masks
llvm-svn: 266559
2016-04-17 11:34:18 +00:00
Asaf Badouh aec79651c1 [X86] Remove unneeded variables
no functional change.
ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL).

Differential Revision: http://reviews.llvm.org/D18942

llvm-svn: 266557
2016-04-17 08:28:40 +00:00
Craig Topper 75869d5701 [AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled.
llvm-svn: 266554
2016-04-17 07:25:39 +00:00
Craig Topper 1663e7a472 [X86] Use ternary operator to reduce code slightly. NFC
llvm-svn: 266534
2016-04-16 19:09:32 +00:00
Simon Pilgrim fd4b9b02a3 [X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support
Added additional test that peeks through bitcast to v16i8 mask

llvm-svn: 266533
2016-04-16 17:52:07 +00:00
Craig Topper ea46b592ab Add a setOperationPromotedToType convenience method that sets an operation to promoted and set the type in one call. Use it so save code in X86.
llvm-svn: 266413
2016-04-15 06:20:18 +00:00
Craig Topper 13e9dc66e4 [X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly.
llvm-svn: 266412
2016-04-15 06:20:14 +00:00
Craig Topper 5e20fd3e7c [X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC
llvm-svn: 266410
2016-04-15 04:57:09 +00:00
Matthias Braun 46b0f03e12 TargetLowering: Factor out common code for tail call eligibility checking; NFC
llvm-svn: 266270
2016-04-14 01:10:42 +00:00
Matthias Braun 588d1cdad4 X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Sanjay Patel e6a0a23e08 fix indentation; NFC
llvm-svn: 266097
2016-04-12 18:01:48 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Craig Topper 35db8ecb50 [X86] Use for loops over types to reduce code for setting up operation actions.
llvm-svn: 265893
2016-04-10 05:39:32 +00:00
Craig Topper dcc8f49bf0 [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC
llvm-svn: 265892
2016-04-10 05:39:28 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Craig Topper f027107094 [X86] Use for loops over types to reduce code for setting up operation actions. NFC
llvm-svn: 265871
2016-04-09 06:31:02 +00:00
Craig Topper e801ed9e15 [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC
llvm-svn: 265870
2016-04-09 05:53:48 +00:00
Tim Shen 0012756489 [SSP] Remove llvm.stackprotectorcheck.
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).

llvm-svn: 265851
2016-04-08 21:26:31 +00:00
Hans Wennborg e25b65bdb7 Rangeify a loop. NFC.
llvm-svn: 265846
2016-04-08 20:46:09 +00:00
Hans Wennborg 74ff770670 Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC
These are already defined, with the same values, a few lines up. NFC.

llvm-svn: 265845
2016-04-08 20:46:00 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Evgeniy Stepanov dde29e2799 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Ahmed Bougacha 50e6cd4a3a [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha 629446ba03 [X86] Simplify early-exit check. NFC.
llvm-svn: 265447
2016-04-05 20:02:22 +00:00
Matthias Braun 870c34f0cf ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
Elena Demikhovsky e99c561391 AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim 0edd3d771a [X86] Removed duplicate code.
llvm-svn: 265274
2016-04-03 20:40:35 +00:00
Simon Pilgrim cd0dfc93eb [X86][SSE] Support for MOVMSK signbit extraction instructions
Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR.

This is an early step towards more optimal handling of vector comparison results.

Differential Revision: http://reviews.llvm.org/D18741

llvm-svn: 265266
2016-04-03 18:22:03 +00:00
Elena Demikhovsky 5e426f7356 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Sanjay Patel 9f413364d5 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Sanjay Patel a05e0ff223 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Andrea Di Biagio 8c48841907 [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.
Since revision 235394, we no longer perform target specific combines on
build_vector nodes. No functional change intended.

llvm-svn: 265138
2016-04-01 12:25:44 +00:00
Sanjay Patel 92d5ea5e07 [x86] use SSE/AVX ops for non-zero memsets (PR27100)
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.

Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143

Differential Revision: http://reviews.llvm.org/D18566

llvm-svn: 265029
2016-03-31 17:30:06 +00:00
Nirav Dave 83ce54aac2 Prevent X86ISelLowering from merging volatile loads
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18546

llvm-svn: 265013
2016-03-31 13:40:55 +00:00
Craig Topper d2aa03a60a [X86] Use MVT instead of EVT in code called after legalization.
llvm-svn: 264992
2016-03-31 04:37:41 +00:00
Matthias Braun 8d41436004 CodeGen: Factor out code for tail call result compatibility check; NFC
llvm-svn: 264959
2016-03-30 22:46:04 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Chandler Carruth 81c3ddeb1c [x86] Extract a helper function to compute the full addressing mode from
an x86 MachineInstr's operands. This will be super useful to fix some
bad atomics code in my next commit.

No functionality changed.

llvm-svn: 264819
2016-03-30 03:10:24 +00:00
Simon Pilgrim d3df400fa9 [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Elena Demikhovsky 83f0647d85 AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Simon Pilgrim dcdf85033c [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim e4dbeb40c6 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim 3eef33a806 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim 7379a70677 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
Simon Pilgrim ff7b7141cd [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.

llvm-svn: 264506
2016-03-26 09:29:04 +00:00
David Majnemer 020e890a19 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Simon Pilgrim ac04923b0f [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC.
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.

llvm-svn: 264403
2016-03-25 14:17:54 +00:00
Elena Demikhovsky abc9c04ab7 fixed typo
llvm-svn: 264395
2016-03-25 10:08:36 +00:00
Elena Demikhovsky 95f3173ce9 AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Sanjay Patel 7876f180b5 [x86] make peekThroughBitcasts() a helper function
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.

It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.

llvm-svn: 264186
2016-03-23 20:16:37 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Simon Pilgrim fcc4532afa [X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements
Based on feedback for D14261

llvm-svn: 263911
2016-03-20 17:43:07 +00:00
Simon Pilgrim c44472a5bc [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger 3ea8af5108 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Simon Pilgrim 0f37fbac51 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Sanjay Patel be37e62e0c fix function names; NFC
llvm-svn: 263646
2016-03-16 18:00:09 +00:00
Igor Breger 0ba7b04f5f AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Eric Christopher da8b3f1914 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Sanjay Patel 7506852709 [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Sanjay Patel 62d707c8d9 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Igor Breger a949100532 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Simon Pilgrim 035b19ecf5 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Quentin Colombet cf9732b417 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Simon Pilgrim 33d57c7547 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Simon Pilgrim 7b2164ffe0 Fix spelling.
llvm-svn: 263266
2016-03-11 17:31:43 +00:00
Simon Pilgrim 7ca9614c71 [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Sanjay Patel 0181943b89 [x86] don't use a shuffle when a vselect will do; NFCI
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.

llvm-svn: 263167
2016-03-10 22:35:33 +00:00
Simon Pilgrim 61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Elena Demikhovsky cd9967d160 AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)
(failed on instruction selection phase)

Differential Revision: http://reviews.llvm.org/D17924

llvm-svn: 263111
2016-03-10 13:44:22 +00:00
Simon Pilgrim 13d4056795 [X86][AVX] Improve target shuffle combining of BLEND+zero
The BLEND+zero combine was failing to combine equivalent BLEND masks.

Follow up to D17483 and D17858

llvm-svn: 263105
2016-03-10 11:50:15 +00:00
Simon Pilgrim 16d11785a5 [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.

This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.

There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.

Differential Revision: http://reviews.llvm.org/D17858

llvm-svn: 263102
2016-03-10 11:23:51 +00:00
Elena Demikhovsky 38f78a2b92 AVX-512: Fixed a bug in shuffle for v64i8 type
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".

Differential Revision: http://reviews.llvm.org/D17994

llvm-svn: 263097
2016-03-10 08:32:09 +00:00
Sanjay Patel 4a8dd89128 [x86, AVX] optimize masked loads with constant masks
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.

Differential Revision: http://reviews.llvm.org/D17899

llvm-svn: 263067
2016-03-09 22:12:08 +00:00
Quentin Colombet 4340b55593 Revert r262759 and r262760.
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix! 

llvm-svn: 262943
2016-03-08 17:29:11 +00:00
Hans Wennborg e00b6e7249 Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.

llvm-svn: 262935
2016-03-08 16:21:41 +00:00
Simon Pilgrim 253ca348b2 [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.

This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.

Removed non-constant pool mask decode path as we have no way of testing it right now.

Differential Revision: http://reviews.llvm.org/D17916

llvm-svn: 262809
2016-03-06 21:54:52 +00:00
Igor Breger 4d94d4d5f7 AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX
Differential Revision: http://reviews.llvm.org/D17913

llvm-svn: 262803
2016-03-06 12:38:58 +00:00
Igor Breger f1bd761e00 AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
Differential Revision: http://reviews.llvm.org/D17763

llvm-svn: 262797
2016-03-06 07:46:03 +00:00
Simon Pilgrim 40e1a71cdd [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.

Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization

Followup to D17681

llvm-svn: 262784
2016-03-05 22:53:31 +00:00
Quentin Colombet 2a7676b442 [X86] Fix the lowering of setjmp intrinsic on i386.
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.

This fixes PR26742.

llvm-svn: 262762
2016-03-05 00:31:04 +00:00
Quentin Colombet 13b524597d [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.

rdar://problem/24851412

llvm-svn: 262759
2016-03-04 23:29:39 +00:00
David Majnemer d2f767d2f6 [X86] Support cleaning more than 2**16 bytes of stack
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.

There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.

To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:

pop  %ecx     ; return address is now in %ecx
add  $n, %esp ; clean the stack
push %ecx     ; bring the return address back on the stack
ret           ; pop the return address and jmp to it's value

llvm-svn: 262755
2016-03-04 22:56:17 +00:00
Simon Pilgrim abcee45b7a [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.

This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.

Differential Revision: http://reviews.llvm.org/D17681

llvm-svn: 262635
2016-03-03 18:13:53 +00:00
Simon Pilgrim 022afe2538 [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.
lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway.

llvm-svn: 262633
2016-03-03 17:54:35 +00:00
Simon Pilgrim 0107d24810 [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.
llvm-svn: 262631
2016-03-03 17:35:43 +00:00
Ahmed Bougacha 671795a985 [X86] Don't assume that shuffle non-mask operands starts at #0.
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).

Since:
  r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.

This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.

Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.

One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).

Differential Revision: http://reviews.llvm.org/D17041

llvm-svn: 262627
2016-03-03 16:53:50 +00:00
Igor Breger 639fde79b0 AVX512: Combine AND + TESTM instructions .
Differential Revision: http://reviews.llvm.org/D17844

llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Simon Pilgrim 91dd0a796c [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 262599
2016-03-03 09:43:28 +00:00
Hans Wennborg 153e4b0f11 [X86] Enable forwarding bool arguments in tail calls (PR26305)
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.

Differential Revision: http://reviews.llvm.org/D17786

llvm-svn: 262575
2016-03-03 02:06:32 +00:00
David Majnemer 1ef654024f [X86] Don't give catch objects a displacement of zero
Catch objects with a displacement of zero do not initialize a catch
object.  The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.

If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.

Address this by creating our catch objects as fixed objects.  We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.

Differential Revision: http://reviews.llvm.org/D17823

llvm-svn: 262546
2016-03-03 00:01:25 +00:00
Reid Kleckner 65f9d9cd32 Revert "[X86] Elide references to _chkstk for dynamic allocas"
This reverts commit r262370.

It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404

The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.

llvm-svn: 262505
2016-03-02 19:20:59 +00:00
Simon Pilgrim c02b72627a [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.

This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.

It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.

Differential Revision: http://reviews.llvm.org/D17680

llvm-svn: 262478
2016-03-02 11:43:05 +00:00
David Majnemer 5aadde1ecc [X86] Permit reading of the FLAGS register without it being previously defined
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.

Differential Revision: http://reviews.llvm.org/D17782

llvm-svn: 262465
2016-03-02 06:46:52 +00:00
Sanjay Patel 18988ae66c [x86] use getBitcast()
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.

llvm-svn: 262391
2016-03-01 20:47:02 +00:00
David Majnemer 791b88b6da [X86] Elide references to _chkstk for dynamic allocas
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations.  However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.

This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.

Differential Revision: http://reviews.llvm.org/D17679

llvm-svn: 262370
2016-03-01 19:20:23 +00:00
Sanjay Patel 8a37a988e6 fix function names; NFC
llvm-svn: 262367
2016-03-01 19:14:09 +00:00
Matt Arsenault 03dac8d8e4 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Hans Wennborg e64cf9dddb [X86] Check that attribute parameters match for tail calls (PR26590)
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.

  void g(signed short);
  void f(unsigned short x) {
    g(x);
  }

llvm-svn: 262352
2016-03-01 17:45:23 +00:00
Sanjay Patel 2ca144f14c fix documentation comments; NFC
llvm-svn: 262351
2016-03-01 17:25:35 +00:00
Sanjay Patel 9fea531fec function names start with a lowercase letter; NFC
llvm-svn: 262347
2016-03-01 16:17:48 +00:00
Ahmed Bougacha bb5d7d7ed8 [X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI.
This is long-standing dirtiness, as acknowledged by r77582:

    The current trick is to select it into a merge_values with
    the first definition being an implicit_def. The proper solution is
    to add new ISD opcodes for the no-output variant.

Doing this before selection will let us combine away some constructs.

Differential Revision: http://reviews.llvm.org/D17659

llvm-svn: 262244
2016-02-29 19:28:07 +00:00
Simon Pilgrim 9e10b1655c Tidyup for loops - don't repeat upper limit evaluation if you don't have to. NFCI.
llvm-svn: 262137
2016-02-27 13:26:58 +00:00
Simon Pilgrim 4d1a088323 Strip trailing whitespace. NFCI.
llvm-svn: 262083
2016-02-26 22:28:50 +00:00
Sanjay Patel 51488ed2d5 [x86] refactor to eliminate duplicated code; NFCI
llvm-svn: 262062
2016-02-26 20:59:05 +00:00
Sanjay Patel 155193c3aa [x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701)
This extends the fold introduced with:
http://reviews.llvm.org/rL262036

llvm-svn: 262047
2016-02-26 18:42:50 +00:00
Sanjay Patel 4402a32b32 [x86, SSE] fold 'isPositive' vector integer operations (PR26701)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701

Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. 
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).

Differential Revision: http://reviews.llvm.org/D17630

llvm-svn: 262036
2016-02-26 16:56:03 +00:00
Simon Pilgrim e4178ae510 [X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles
Now that PerformShuffleCombine can handle unary shuffles.

llvm-svn: 261843
2016-02-25 09:12:12 +00:00
Simon Pilgrim 3b6feeaa7c [X86][SSE41] Combine vector blends with zero
Part 2 of 2
This patch add support for combining target shuffles into blends-with-zero.

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261745
2016-02-24 15:14:21 +00:00
Simon Pilgrim dd01f70085 [X86][SSE41] Combine insertion of zero scalars into vector blends with zero
Part 1 of 2
This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets).
(Part 2 will add support for combining multiple blends-with-zero).

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261743
2016-02-24 14:53:27 +00:00
Simon Pilgrim c291c03702 [X86][SSE] Don't get target shuffle operands prematurely.
PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement.

llvm-svn: 261727
2016-02-24 09:07:47 +00:00
Davide Italiano 62b7f7a398 [X86ISelLowering] Stop typing the same return over and over and over.
llvm-svn: 261666
2016-02-23 18:39:38 +00:00
Davide Italiano 2ec4717c2c [X86ISelLowering] Consolidate duplicated code in a single place.
llvm-svn: 261573
2016-02-22 21:06:46 +00:00
Simon Pilgrim e9093adae0 [X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads
Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements.

Differential Revision: http://reviews.llvm.org/D17297

llvm-svn: 261490
2016-02-21 19:15:48 +00:00
Simon Pilgrim ecb0433599 [X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667)
Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input.

llvm-svn: 261434
2016-02-20 14:39:45 +00:00
Simon Pilgrim ccf2cce67c [X86][SSE] Move all undef/zero cases before target shuffle combining.
First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041).

llvm-svn: 261433
2016-02-20 12:57:32 +00:00
Davide Italiano 228978c0dc [X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled.
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we 
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!

llvm-svn: 261387
2016-02-20 00:44:47 +00:00
Davide Italiano a8f1f2efaf [X86ISelLowering] Provide a more informative assert message.
I stumbled upon this while debugging a lowering bug.

llvm-svn: 261371
2016-02-19 22:18:49 +00:00
Davide Italiano 4cfe2a9e38 [X86ISelLowering] Merge two conditions inside a single if.
llvm-svn: 261370
2016-02-19 22:01:07 +00:00
Sanjay Patel 0adbea4b5c [x86] fix initialization of PredictableSelectIsExpensive
This is effectively NFC because Atom is the only in-order x86 subtarget currently,
but the predicate would have become wrong if any other in-order CPU came along.

See related discussion in:
http://reviews.llvm.org/D16836

llvm-svn: 261275
2016-02-18 23:08:48 +00:00
Davide Italiano 440a676136 [X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC.
llvm-svn: 261255
2016-02-18 20:43:15 +00:00
Hans Wennborg 23cdc643b9 Revert to extend i8/i16 return values on Darwin (PR26665)
In r260133, LLVM was changed to no longer extend i8/i16 return values,
as it's not required by the ABI. However, code was found in the wild
that relies on the old behaviour on Darwin, so this commit reverts
back to that old behaviour for Darwin.

On other platforms, it's less likely that code would be depending on
the old behaviour, as GCC and MSVC haven't been extending such return
values.

llvm-svn: 261235
2016-02-18 18:17:05 +00:00
Igor Breger ac02f1bb62 AVX512: Fix LowerMSCATTER() return value.
Bug description:
  The bug was discovered when test was compiled with -O0.
  In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.

Differential Revision: http://reviews.llvm.org/D17331

llvm-svn: 261090
2016-02-17 14:04:33 +00:00
Simon Pilgrim c5b5dcb985 [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.

This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.

Part 2 of 2

Differential Revision: http://reviews.llvm.org/D17292

llvm-svn: 261082
2016-02-17 10:50:06 +00:00
Simon Pilgrim a50e8d3627 [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.

This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.

Part 1 of 2

Differential Revision: http://reviews.llvm.org/D17292

llvm-svn: 261081
2016-02-17 10:37:49 +00:00
Simon Pilgrim 9904924e6b [X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI.
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.

Renamed from V to Ops to more closely match their purpose.

llvm-svn: 261078
2016-02-17 10:12:30 +00:00
Ahmed Bougacha f3cccab1e0 [X86] Remove the now-unused X86ISD::PSIGN. NFC.
llvm-svn: 261025
2016-02-16 22:14:12 +00:00
Ahmed Bougacha af60a429c9 [X86] Generalize logic blend of (x, -x) combine to match (-x, x).
I suspect this is what let PR26110 lie dormant for so long.

llvm-svn: 261024
2016-02-16 22:14:07 +00:00
Ahmed Bougacha 132fbf5476 [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
Currently, we sometimes miscompile this vector pattern:
    (c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
    (~c & v) | (c & -v)

When we have SSSE3, we incorrectly lower that to PSIGN, which does:
    (c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
    (c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.

Note that the PSIGN tests are also incorrect. Consider:
    %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = and <4 x i32> %a, %0
    %2 = and <4 x i32> %b.lobit, %sub
    %cond = or <4 x i32> %1, %2
    ret <4 x i32> %cond
if %b is zero:
    %b.lobit = <4 x i32> zeroinitializer
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = <4 x i32> %a
    %2 = <4 x i32> zeroinitializer
    %cond = or <4 x i32> %a, zeroinitializer
    ret <4 x i32> %a
whereas we currently generate:
    psignd %xmm1, %xmm0
    retq
which returns 0, as %xmm1 is 0.

Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

Fixes PR26110.

Differential Revision: http://reviews.llvm.org/D17181

llvm-svn: 261023
2016-02-16 22:14:03 +00:00
Ahmed Bougacha f211f685e4 [X86] Extract PSIGN/BLENDVP combine. NFC.
llvm-svn: 261021
2016-02-16 22:13:55 +00:00
Ahmed Bougacha 7502768c6d [X86] Extract ANDNP combine. NFC.
This makes it IMO more readable and reduces indentation.

llvm-svn: 261020
2016-02-16 22:13:49 +00:00
Ahmed Bougacha 71c992d853 [X86] Remove now-dead variable and redundant assert. NFC.
The variable was made dead in NDEBUG by r260901, but the assert
was redundant anyway: get rid of both.

llvm-svn: 260908
2016-02-15 19:32:54 +00:00
Ahmed Bougacha 93cff7fb82 [CodeGen] Document and use getConstant's splat-building feature. NFC.
Differential Revision: http://reviews.llvm.org/D17229

llvm-svn: 260901
2016-02-15 18:07:29 +00:00
Igor Breger 4dc7d390db AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.

Differential Revision: http://reviews.llvm.org/D17138

llvm-svn: 260878
2016-02-15 08:25:28 +00:00
Simon Pilgrim 08ba012973 [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.

On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.

This patch has several benefits:

 * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
 * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
 * Matching the repeating shuffle makes use of a lot of existing shuffle lowering.

There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.

Differential Revision: http://reviews.llvm.org/D16537

llvm-svn: 260834
2016-02-13 21:54:04 +00:00
Sanjay Patel e9bf993cee [x86-64] allow mfence even with -mno-sse (PR23203)
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.

I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)

Differential Revision: http://reviews.llvm.org/D17219

llvm-svn: 260828
2016-02-13 17:26:29 +00:00
Sanjay Patel ac42fecf74 [x86] simplify getZeroVector() ; NFCI
Let DAG.getConstant() handle the splatting; there's no need
to repeat that logic here.

See also:
http://reviews.llvm.org/rL258833
http://reviews.llvm.org/rL260582

llvm-svn: 260609
2016-02-11 22:17:04 +00:00
Sanjay Patel d76d4aabdd [x86] refactor masked load/store combine logic ; NFCI
llvm-svn: 260426
2016-02-10 20:02:45 +00:00
JF Bastien 08499a0110 X86: Remove useless semicolon
llvm-svn: 260359
2016-02-10 04:04:12 +00:00
Sanjay Patel c7dde5f502 [x86] convert masked load of exactly one element to scalar load
This is the load counterpart to the store optimization that was added in:
http://reviews.llvm.org/rL260145

llvm-svn: 260325
2016-02-09 23:44:35 +00:00
Ahmed Bougacha f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Ahmed Bougacha 244cd98474 [X86] Don't reuse an unrelated variable, create a new one. NFC.
Using Op makes it look like we're doing something with it.
We're really not.

llvm-svn: 260315
2016-02-09 22:54:05 +00:00
Ahmed Bougacha 46db084c71 [X86] Remove unnecessary assignment. NFC.
llvm-svn: 260314
2016-02-09 22:53:58 +00:00
Sanjay Patel 73200f72de [SelectionDAG] make getMemBasePlusOffset() accessible; NFCI
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.

llvm-svn: 260295
2016-02-09 21:42:04 +00:00
Sanjay Patel 62dde825d8 [x86] make getOneTrueElt() a helper function ; NFC
As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform
will need this logic, so I'm moving it out to make that patch smaller.

llvm-svn: 260240
2016-02-09 17:39:58 +00:00
Simon Pilgrim 7e671e06a2 [X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets.
On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG.

This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead.

Differential Revision: http://reviews.llvm.org/D16994

llvm-svn: 260210
2016-02-09 08:19:19 +00:00
Simon Pilgrim a207436b01 [X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.

This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.

Differential Revision: http://reviews.llvm.org/D16956

llvm-svn: 260168
2016-02-08 23:03:46 +00:00
Sanjay Patel 264d7e5b68 [x86] convert masked store of one element to scalar store
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.

Code comments note that this transform could be extended for other targets / cases.

Differential Revision: http://reviews.llvm.org/D16828

llvm-svn: 260145
2016-02-08 21:05:08 +00:00
Hans Wennborg 850ec6ca18 [X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532)
This matches GCC and MSVC's behaviour, and saves on code size.

We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.

The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].

Differential Revision: http://reviews.llvm.org/D16907

 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
 [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ

llvm-svn: 260133
2016-02-08 19:34:30 +00:00
Simon Pilgrim f116e4acc7 [X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.

This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.

Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.

Differential Revision: http://reviews.llvm.org/D16683

llvm-svn: 260063
2016-02-07 22:51:06 +00:00
Asaf Badouh ad5c3fc47d [X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
Differential Revision: http://reviews.llvm.org/D16629

llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Simon Pilgrim 73fc26b44a [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI.
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.

The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.

llvm-svn: 260032
2016-02-07 14:33:03 +00:00
Simon Pilgrim 9e369f2a51 [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim 11e4d1146f Comment fix
llvm-svn: 259990
2016-02-06 14:21:49 +00:00
Simon Pilgrim 7823fd2535 [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Simon Pilgrim 6788f33cf2 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Michael Zuckerman 7d73360479 [AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
Differential Revision: http://reviews.llvm.org/D16589

llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim 1d2d6c5a57 [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.
llvm-svn: 259771
2016-02-04 09:27:19 +00:00
Sanjay Patel 460ce9cd9b clean up; NFC
llvm-svn: 259720
2016-02-03 22:37:37 +00:00
Simon Pilgrim 18bcf93efb [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
Follow up to D16217 and D16729

This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD

Differential Revision: http://reviews.llvm.org/D16768

llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Asaf Badouh 5a3a0231f4 [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Igor Breger 56b039ea17 AVX512: fix mask handling for gather/scatter/prefetch intrinsics.
Differential Revision: http://reviews.llvm.org/D16755

llvm-svn: 259346
2016-02-01 09:57:15 +00:00
Simon Pilgrim 1358d86659 [X86][SSE] Find source of the inserted element of INSERTPS
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.

Differential Revision: http://reviews.llvm.org/D16652

llvm-svn: 259343
2016-02-01 08:59:30 +00:00
Igor Breger 6cc9115cec AVX512 : Fix SETCCE lowering for KNL 32 bit.
Differential Revision: http://reviews.llvm.org/D16752

llvm-svn: 259342
2016-02-01 07:56:09 +00:00
Mitch Bodart e5cadbbcdd [X86] Test commit, fixed typos in comments. NFC.
llvm-svn: 259057
2016-01-28 16:40:51 +00:00
Simon Pilgrim de16172d9d [x86] Merge multiple calls to DAG.getTargetLoweringInfo(). NFC.
llvm-svn: 259050
2016-01-28 15:29:11 +00:00
Igor Breger fca0a34398 AVX512: Fix truncate v32i8 to v32i1 lowering implementation.
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.

Differential Revision: http://reviews.llvm.org/D16531

llvm-svn: 259044
2016-01-28 13:19:25 +00:00
Simon Pilgrim d3b78430d1 [X86][SSE] Move setTargetShuffleZeroElements closer to getTargetShuffleMask. NFCI.
Keep target shuffle mask helper functions closer together.

llvm-svn: 259034
2016-01-28 09:45:01 +00:00
Igor Breger d6c187b038 AVX512: Add store mask patterns.
Differential Revision: http://reviews.llvm.org/D16596

llvm-svn: 258914
2016-01-27 08:43:25 +00:00
Sanjay Patel 06fe9183b0 [x86] make the subtarget member a const reference, not a pointer ; NFCI
It's passed in as a reference; it's not optional; it's not a pointer.

llvm-svn: 258867
2016-01-26 22:08:58 +00:00
Simon Pilgrim 00adc1e105 [X86] Add support for zeroed shuffle elements to getShuffleScalarElt
Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads.

llvm-svn: 258865
2016-01-26 21:39:25 +00:00
Sanjay Patel 3e1701da29 [x86] add materializeVectorConstant() helper function; NFC
LowerBUILD_VECTOR is still over 300 lines long, but it's a start...

llvm-svn: 258858
2016-01-26 21:05:00 +00:00
Sanjay Patel 70fa79fdf2 [x86] simplify getOnesVector() ; NFCI
Let DAG.getConstant() handle the splatting; there's no need
to repeat that logic here.

llvm-svn: 258833
2016-01-26 18:49:36 +00:00
Simon Pilgrim 46696ef93c [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).

It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.

After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.

Differential Revision: http://reviews.llvm.org/D16217

llvm-svn: 258798
2016-01-26 09:30:08 +00:00
Matthias Braun 4e67e5c91a X86ISelLowering: Fix cmov(cmov) special lowering bug
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.

If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.

This fixes http://llvm.org/PR26256 (= rdar://24329747)

llvm-svn: 258729
2016-01-25 22:08:25 +00:00
Asaf Badouh 655822ab7e [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
 VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators

Differential Revision: http://reviews.llvm.org/D16407

llvm-svn: 258680
2016-01-25 11:14:24 +00:00
Igor Breger 1e5bafbc82 AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16137

llvm-svn: 258657
2016-01-24 08:04:33 +00:00
Simon Pilgrim 0423b382d3 [X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC.
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.

llvm-svn: 258645
2016-01-23 22:02:48 +00:00
Simon Pilgrim ead22d095e Added missing comment. NFC.
llvm-svn: 258624
2016-01-23 14:38:02 +00:00
Simon Pilgrim fd66169341 [X86][SSE] Remove INSERTPS dependencies from unreferenced operands.
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.

llvm-svn: 258622
2016-01-23 13:37:07 +00:00
Sanjay Patel c4efadb665 fix typos; NFC
llvm-svn: 258567
2016-01-22 22:09:41 +00:00
Simon Pilgrim 5ba1c127fc [X86][SSE] Improve i16 splatting shuffles
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.

Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.

Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.

Differential Revision: http://reviews.llvm.org/D14901

llvm-svn: 258440
2016-01-21 22:07:41 +00:00
Igor Breger d3341f5021 AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16350

llvm-svn: 258309
2016-01-20 13:11:47 +00:00
Simon Pilgrim 4b919b2ab3 [X86][SSE] Add VZEXT_MOVL target shuffle decoding.
Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines.

llvm-svn: 258215
2016-01-19 23:04:56 +00:00
Simon Pilgrim e74653b67a [X86][SSE] Add INSERTPS target shuffle combines.
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.

This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.

Differential Revision: http://reviews.llvm.org/D16072

llvm-svn: 258205
2016-01-19 22:24:12 +00:00
Asaf Badouh d4a0d9a78c [X86][AVX512]fix dag & add intrinsics for fixupimm
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics

Differential Revision: http://reviews.llvm.org/D16313

llvm-svn: 258124
2016-01-19 14:21:39 +00:00
Simon Pilgrim 3e5fb61978 [X86][AVX2] Broadcast subvectors
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly.

Differential Revision: http://reviews.llvm.org/D16050

llvm-svn: 258081
2016-01-18 20:59:04 +00:00
Igor Breger 239fda676c AVX512: Masked store intrinsic implementation.
Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD.

Differential Revision: http://reviews.llvm.org/D16271

llvm-svn: 258047
2016-01-18 13:52:57 +00:00
Igor Breger e1f273d900 AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic.
Differential Revision: http://reviews.llvm.org/D16184

llvm-svn: 258009
2016-01-17 12:10:24 +00:00
Simon Pilgrim 20f31fa31a [X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering
Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks.

Minor follow up to D15477.

llvm-svn: 258000
2016-01-16 22:30:20 +00:00
NAKAMURA Takumi 33ff1dda6a [Cygwin] Use -femulated-tls by default since r257718 introduced the new pass.
FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp.
FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress.
llvm-svn: 257984
2016-01-16 03:44:52 +00:00
Manman Ren 4fe01bd8f9 CXX_FAST_TLS calling convention: fix issue on X86-64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257925
2016-01-15 19:35:42 +00:00
David Majnemer 3463e696fb [X86] Don't alter HasOpaqueSPAdjustment after we've relied on it
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it.  Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing.  If it changes, invariants in the
backend will quietly break.  This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.

This fixes PR26124.

llvm-svn: 257730
2016-01-14 01:20:03 +00:00
Michael Zuckerman 6b35f460ac Fixing warning by adding the X86ISD::VROTRI case.
Differential Revision: http://reviews.llvm.org/D16052 

llvm-svn: 257607
2016-01-13 15:48:42 +00:00
Michael Zuckerman 2ddcbcf464 [AVX512] adding PROLQ and PROLD Intrinsics
Differential Revision: http://reviews.llvm.org/D16048

llvm-svn: 257523
2016-01-12 21:19:17 +00:00
Igor Breger ea8e8e9f97 AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16042

llvm-svn: 257463
2016-01-12 10:02:32 +00:00
Manman Ren ed967f3752 CXX_FAST_TLS calling convention: performance improvement for x86-64.
This is the same change on x86-64 as r255821 on AArch64.
rdar://9001553

llvm-svn: 257428
2016-01-12 01:08:46 +00:00
Elena Demikhovsky 542dfcf44c Optimized instruction sequence for sitofp operation on X86-32
Optimized sitofp i64 %x to double. The current sequence

movl %ecx, 8(%esp) 
movl %edx, 12(%esp) 
fildll 8(%esp)

is replaced with:

movd %ecx, %xmm0 
movd %edx, %xmm1 
punpckldq %xmm1, %xmm0 
movq %xmm0, 8(%esp)

Differential Revision: http://reviews.llvm.org/D15946

llvm-svn: 257285
2016-01-10 09:41:22 +00:00
Simon Pilgrim c7bebcbfd8 [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.

This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.

llvm-svn: 257266
2016-01-09 20:59:39 +00:00
Simon Pilgrim 2e7a1849c9 [X86][AVX] Add support for i64 broadcast loads on 32-bit targets
Added 32-bit AVX1/AVX2 broadcast tests.

llvm-svn: 257264
2016-01-09 19:59:27 +00:00
Nico Weber 4324b9b236 Revert r257055, it caused PR26064.
llvm-svn: 257066
2016-01-07 15:01:46 +00:00
Simon Pilgrim bcc11a059e [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.

Follow up to D15310

llvm-svn: 257055
2016-01-07 11:34:27 +00:00
Simon Pilgrim 83e44c66ae [X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.

llvm-svn: 257046
2016-01-07 10:24:19 +00:00
Simon Pilgrim bc82dedd26 [X86] Determine if target shuffle can contain zero elements
getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult.

This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier.

I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it.

Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261.

Differential Revision: http://reviews.llvm.org/D15378

llvm-svn: 256992
2016-01-06 23:24:40 +00:00
Quentin Colombet eb61e8e6b0 [X86] Correctly model TLS calls w.r.t. frame requirements.
TLS calls need the stack frame to be properly set up and this
implies that such calls need ADJUSTSTACK_xxx markers.

Fixes PR25820.

llvm-svn: 256959
2016-01-06 19:09:26 +00:00
Sanjay Patel ab69e9f497 refactor divrem8 lowering; NFCI
The code duplication contributed to PR25754:
https://llvm.org/bugs/show_bug.cgi?id=25754

llvm-svn: 256957
2016-01-06 18:47:09 +00:00
Artyom Skrobov 51f2d11be9 PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result
Reviewers: spatel, srking

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15331

llvm-svn: 256924
2016-01-06 09:41:10 +00:00
Simon Pilgrim 267163e713 [X86][SSE] There is no zmm addsubpd/addsubps instruction.
Replace the assert in combineShuffleToAddSub with an early out.

llvm-svn: 256922
2016-01-06 09:08:49 +00:00
Simon Pilgrim eaabd64a11 [X86][SSE] An empty target shuffle mask is always a failure.
As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure.

llvm-svn: 256921
2016-01-06 08:59:32 +00:00
David Majnemer 861a0ae349 [X86] Determine if we have an OpaqueSPAdjustment earlier
We queried hasFP before we hit ExpandISelPseudos.  ExpandISelPseudos
manipulated state that hasFP relied on, potentially changing the result
after it has been queried elsewhere.

While I am not aware of any particular bug due to this state of affairs,
it seems best to avoid it entirely by changing the state during DAG
construction.

llvm-svn: 256849
2016-01-05 17:46:36 +00:00
Simon Pilgrim d47ac60f00 [X86][SSE] Merge PerformBLENDICombine into PerformShuffleCombine
PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward.

llvm-svn: 256819
2016-01-05 09:12:17 +00:00
Simon Pilgrim e6955f3211 [X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type
llvm-svn: 256782
2016-01-04 21:41:11 +00:00
David Majnemer ca1c9f074f [X86] Make hasFP constant time
We need a frame pointer if there is a push/pop sequence after the
prologue in order to unwind the stack.  Scanning the instructions to
figure out if this happened made hasFP not constant-time which is a
violation of expectations.  Let's compute this up-front and reuse that
computation when we need it.

llvm-svn: 256730
2016-01-04 04:49:41 +00:00
David Majnemer 011980cd50 [X86] Add intrinsics for reading and writing to the flags register
LLVM's targets need to know if stack pointer adjustments occur after the
prologue.  This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.

Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction.  There is an interesting corner case:
inline assembly.

The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate.  Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't.  The stack pointer is decremented and then
immediately incremented.  Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations.  This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.

Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.

llvm-svn: 256685
2016-01-01 06:50:01 +00:00
Craig Topper 69653af748 [X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library.
llvm-svn: 256680
2015-12-31 22:40:45 +00:00
Asaf Badouh af6569afd2 [X86][PKU] Add {RD,WR}PKRU intrinsics
Differential Revision: http://reviews.llvm.org/D15808

llvm-svn: 256670
2015-12-31 08:31:13 +00:00
Sanjay Patel b3c53e512f [x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
http://reviews.llvm.org/rL256510

llvm-svn: 256522
2015-12-28 21:16:55 +00:00
Sanjay Patel 9da2b647c7 [x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454

llvm-svn: 256510
2015-12-28 19:20:19 +00:00
Michael Kuperstein 2ea81baf3a [X86] Better support for the MCU psABI (LLVM part)
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.

The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.

Differential Revision: http://reviews.llvm.org/D15054

llvm-svn: 256494
2015-12-28 14:39:21 +00:00
Asaf Badouh fba562004b [X86][AVX512] Lower broadcast sub vector to vector inrtrinsics
lower broadcast<type>x<vector> to shuffles.
 there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).



Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256490
2015-12-28 08:26:26 +00:00
Craig Topper f3ed5c115c [AVX512] Remove separate instruction and patterns for lowering ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering.
llvm-svn: 256477
2015-12-27 21:33:50 +00:00
Igor Breger 756c289dd8 AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.

Differential Revision: http://reviews.llvm.org/D15675

llvm-svn: 256470
2015-12-27 13:56:16 +00:00
Sanjay Patel bcff3f7d92 [x86] lower calls to llvm.maxnum.v4f32 using maxps
This is a follow-on to:
http://reviews.llvm.org/rL255700

llvm-svn: 256454
2015-12-26 21:44:55 +00:00
Craig Topper fa5f35e6ad [X86] Fold some variable declarations and initializations into if statements. NFC
llvm-svn: 256451
2015-12-26 19:48:37 +00:00
Craig Topper 91dab7baee [X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size.
This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types.

llvm-svn: 256425
2015-12-25 22:09:45 +00:00
Igor Breger 268f6f53c5 AVX512: VPMOVM2B/W/D/Q intrinsic implementation.
Differential Revision: http://reviews.llvm.org//D15747

llvm-svn: 256364
2015-12-24 07:11:53 +00:00
Simon Pilgrim 17377bdd45 [X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.

As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.

Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.

Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).

Differential Revision: http://reviews.llvm.org/D15477

llvm-svn: 256332
2015-12-23 13:10:07 +00:00
Igor Breger 7b46b4e798 AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts.
Differential Revision: http://reviews.llvm.org/D15713

llvm-svn: 256324
2015-12-23 08:06:50 +00:00
Cong Hou 8df93ce455 [X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.

Cost table is updated accordingly.


Differential revision: http://reviews.llvm.org/D14588

llvm-svn: 256194
2015-12-21 20:42:43 +00:00
Igor Breger 44b60a3687 AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored.
llvm-svn: 256157
2015-12-21 14:40:36 +00:00
Amjad Aboud 60b5e1b6c0 Implemented Support of IA interrupt and exception handlers:
http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html

Differential Revision: http://reviews.llvm.org/D15567

llvm-svn: 256155
2015-12-21 14:07:14 +00:00
NAKAMURA Takumi 9ec6a826dd [Cygwin] Enable TLS as emutls.
It resolves clang selfhosting with std::once() for Cygwin.

FIXME: It may be EmulatedTLS-generic also for X86-Android.
FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls.
llvm-svn: 256134
2015-12-21 02:37:23 +00:00
Michael Kuperstein e75e6e2a23 [X86] Improve shift combining
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into       (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into    (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])

sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.

This fixes PR24373.

Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161

llvm-svn: 255761
2015-12-16 11:22:37 +00:00
Reid Kleckner 7850c9f5ca [WinEH] Make llvm.x86.seh.recoverfp work on x64
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.

Fixes SEH filter captures, which were broken in r250088.

Issue reported by Alex Crichton.

llvm-svn: 255707
2015-12-15 23:40:58 +00:00
Sanjay Patel 271efcdf20 [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend 
already has in PerformSELECTCombine().

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the 
vector.

Differential Revision: http://reviews.llvm.org/D15294

llvm-svn: 255700
2015-12-15 23:11:43 +00:00
Reid Kleckner d7045faa10 [WinEH] Remove unused intrinsic llvm.x86.seh.restoreframe
We can clean this up now that we have the X86 CATCHRET instruction to
restore the FP, SP, and BP.

llvm-svn: 255677
2015-12-15 21:41:34 +00:00
Elena Demikhovsky 6015f5c823 Type legalizer for masked gather and scatter intrinsics.
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).

This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.

Differential Revision: http://reviews.llvm.org/D13633

llvm-svn: 255629
2015-12-15 08:40:41 +00:00
Chih-Hung Hsieh 7993e18e80 [X86] Part 2 to fix x86-64 fp128 calling convention.
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
  Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
  When target has MMX registers, configure MVT::f128 in FR128RegClass,
  with TypeSoftenFloat action, and custom actions for some opcodes.
  Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
  Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
  Fix infinite loop when f128 type can have
  VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.

Differential Revision: http://reviews.llvm.org/D11438

llvm-svn: 255558
2015-12-14 22:08:36 +00:00
Chen Li 1b26b9ec9d [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255415
2015-12-12 01:04:15 +00:00
Chen Li 02ef2e1385 Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
because it broke buildbot.

llvm-svn: 255395
2015-12-12 00:08:37 +00:00
Chen Li e8f9387e0c [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255391
2015-12-11 23:39:32 +00:00
Simon Pilgrim 323e00d9c7 [X86][AVX] Fold loads + splats into broadcast instructions
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.

This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.

Fix for PR23022

Differential Revision: http://reviews.llvm.org/D15310

llvm-svn: 255061
2015-12-08 22:17:11 +00:00
Elena Demikhovsky 291fe0159f VX-512: Fixed a bug in FP logic operation lowering
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.

Differential Revision: http://reviews.llvm.org/D15110

llvm-svn: 254913
2015-12-07 14:33:34 +00:00
Elena Demikhovsky 33e61eceb4 AVX-512: Fixed masked load / store instruction selection for KNL.
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.

This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.

All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.

Differential Revision: http://reviews.llvm.org/D15265

llvm-svn: 254909
2015-12-07 13:39:24 +00:00
Igor Breger 3ab6f17530 AVX-512: implement kunpck intrinsics.
Differential Revision: http://reviews.llvm.org/D14821

llvm-svn: 254908
2015-12-07 13:25:18 +00:00
Igor Breger 076dfe5c12 AVX512: support AVX512BW Intrinsic in 32bit mode.
Differential Revision: http://reviews.llvm.org/D15076

llvm-svn: 254873
2015-12-06 11:35:18 +00:00
Hans Wennborg 5000ce8a63 X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.

(GCC puts these instructions behind -msahf on 64-bit for the same reason.)

This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.

Differential Revision: http://reviews.llvm.org/D15240

llvm-svn: 254793
2015-12-04 23:00:33 +00:00
Reid Kleckner 93fc520339 [X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.

This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.

Reviewers: qcolombet

Subscribers: hans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15200

llvm-svn: 254645
2015-12-03 20:46:59 +00:00
David Majnemer 70497c696a Move EH-specific helper functions to a more appropriate place
No functionality change is intended.

llvm-svn: 254562
2015-12-02 23:06:39 +00:00
Simon Pilgrim 3fc3454a0c [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Asaf Badouh 2489f350c0 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 254493
2015-12-02 08:17:51 +00:00
Craig Topper f419a1f69a [X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types.
llvm-svn: 254490
2015-12-02 06:39:19 +00:00
Craig Topper 6164297f46 [X86] Fix weird identation. NFC
llvm-svn: 254487
2015-12-02 05:24:38 +00:00
Sanjay Patel 60216f6943 [x86] add a convenience method to check for FMA capability; NFCI
llvm-svn: 254425
2015-12-01 17:27:55 +00:00
Sanjay Patel 239be1fb0d fix formatting; NFC
llvm-svn: 254310
2015-11-30 17:52:02 +00:00
Aaron Ballman 33c95f08b0 Silencing a 32-bit to 64-bit implicit conversion warning; NFC.
llvm-svn: 254302
2015-11-30 14:52:33 +00:00
Craig Topper aad5f11e5f [AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns.
For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output.

llvm-svn: 254277
2015-11-30 00:13:24 +00:00
Craig Topper ecae476e4c [X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices.
llvm-svn: 254269
2015-11-29 22:53:22 +00:00
Simon Pilgrim 88aa627c0b [X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.

llvm-svn: 254259
2015-11-29 16:41:04 +00:00
Craig Topper 0009656335 [X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel.
llvm-svn: 254172
2015-11-26 19:41:34 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Elena Demikhovsky f07df9fcac AVX-512: Fixed a bug in VPERMT2* intrinsic.
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.

Differential Revision: http://reviews.llvm.org/D14942

llvm-svn: 254059
2015-11-25 08:17:56 +00:00
Kaelyn Takata d0955312d9 Fix an asan error where NumElements > 32 for at least one case in
test/CodeGen/X86/avg.ll.

llvm-svn: 254043
2015-11-25 00:03:29 +00:00
Simon Pilgrim 1b4fecb098 [X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.

Fix for PR24364

Differential Revision: http://reviews.llvm.org/D14906

llvm-svn: 254016
2015-11-24 20:31:46 +00:00
Cong Hou db6220f84d [X86] Fix several issues related to X86's psadbw instruction.
This patch fixes the following issues:

1. Fix the return type of X86psadbw: it should not be the same type of inputs.
   For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.


Differential revision: http://reviews.llvm.org/D14897

llvm-svn: 254010
2015-11-24 19:51:26 +00:00
Cong Hou bed60d35ed [X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW.
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:

%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>

and with this patch it will be converted to a X86ISD::AVG instruction.

The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.

Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.


Differential revision: http://reviews.llvm.org/D14761

llvm-svn: 253952
2015-11-24 05:44:19 +00:00
Elena Demikhovsky 0fd11526e2 AVX-512: Optimized INSERT_SUBVECTOR for i1 vector types
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector.
CONCAT_VECTORS uses ISERT_SUBVECTOR.

Differential Revision: http://reviews.llvm.org/D14815

llvm-svn: 253819
2015-11-22 13:57:38 +00:00
Sanjay Patel 8066d906f1 fix formatting; NFC
llvm-svn: 253802
2015-11-22 00:03:16 +00:00
Simon Pilgrim a9912617c8 [X86][SSE4A] Fix issue with EXTRQI shuffles not starting at the correct start index.
Found during stress testing.

llvm-svn: 253611
2015-11-19 22:13:56 +00:00
Hans Wennborg dcc2500452 X86: More efficient legalization of wide integer compares
In particular, this makes the code for 64-bit compares on 32-bit targets
much more efficient.

Example:

  define i32 @test_slt(i64 %a, i64 %b) {
  entry:
    %cmp = icmp slt i64 %a, %b
    br i1 %cmp, label %bb1, label %bb2
  bb1:
    ret i32 1
  bb2:
    ret i32 2
  }

Before this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          setae   %al
          cmpl    16(%esp), %ecx
          setge   %cl
          je      .LBB2_2
          movb    %cl, %al
  .LBB2_2:
          testb   %al, %al
          jne     .LBB2_4
          movl    $1, %eax
          retl
  .LBB2_4:
          movl    $2, %eax
          retl

After this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          sbbl    16(%esp), %ecx
          jge     .LBB1_2
          movl    $1, %eax
          retl
  .LBB1_2:
          movl    $2, %eax
          retl

Differential Revision: http://reviews.llvm.org/D14496

llvm-svn: 253572
2015-11-19 16:35:08 +00:00