Commit Graph

15063 Commits

Author SHA1 Message Date
Oren Ben Simhon fe34c5e429 Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.

Differential Revision: https://reviews.llvm.org/D28566

llvm-svn: 297715
2017-03-14 09:09:26 +00:00
Craig Topper 7a5ee1c5ed [AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index.
llvm-svn: 297707
2017-03-14 06:40:04 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Craig Topper 9d50e187cd [AVX-512] Pre-emptively fix more places in fastisel where we might copy a VK1 register into a AH/BH/CH/DH register.
llvm-svn: 297704
2017-03-14 04:18:25 +00:00
Craig Topper 784f241b59 [AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel.
Fixes PR32256. Still planning to do an audit for other possible cases.

llvm-svn: 297678
2017-03-13 21:58:54 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Andrew Kaylor a11d020699 Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier
I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced.

llvm-svn: 297664
2017-03-13 20:35:10 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Craig Topper 616641632e [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies.
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.

llvm-svn: 297652
2017-03-13 18:34:46 +00:00
Craig Topper eb7ea28bdd [AVX-512] If gather mask is all ones, force the input to a zero vector.
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.

Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.

llvm-svn: 297651
2017-03-13 18:17:46 +00:00
Craig Topper 48ba1e2d66 [AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so they can be correctly matched by EVEX2VEX table generation.
llvm-svn: 297601
2017-03-13 05:14:47 +00:00
Craig Topper 08b413acf2 [AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.
llvm-svn: 297600
2017-03-13 05:14:44 +00:00
Craig Topper 5a63ca2ad2 [AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.
llvm-svn: 297599
2017-03-13 03:59:06 +00:00
Craig Topper 111b2d6997 [X86] Remove unused SDTypeProfile. NFC
llvm-svn: 297594
2017-03-12 23:05:03 +00:00
Craig Topper 2b92542908 [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.
This allows us to remove a duplicate set of patterns.

llvm-svn: 297593
2017-03-12 23:05:00 +00:00
Craig Topper 7d56c8315b [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.

llvm-svn: 297591
2017-03-12 22:29:12 +00:00
Sanjay Patel f06b963a2b [x86] don't blindly transform SETB into SBB
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. 
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.

This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.

Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.

The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.

Differential Revision: https://reviews.llvm.org/D30611

llvm-svn: 297586
2017-03-12 18:28:48 +00:00
Craig Topper 58647b16e5 [AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.

I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.

llvm-svn: 297574
2017-03-12 03:37:37 +00:00
Craig Topper 6ab5edfa73 [AVX-512] Remove unused field in X86VectorVTInfo tablegen class.
llvm-svn: 297572
2017-03-12 03:37:32 +00:00
Simon Pilgrim 18debfa5b4 [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.

This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.

Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?

Differential Revision: https://reviews.llvm.org/D29841

llvm-svn: 297568
2017-03-11 20:42:31 +00:00
Simon Pilgrim 9ff5732c92 Remove unnecessary whitespace.
llvm-svn: 297567
2017-03-11 20:23:59 +00:00
Craig Topper 02b463270c [X86] Remove unnecessary commented out code. NFC
llvm-svn: 297563
2017-03-11 18:25:56 +00:00
Simon Pilgrim 128a10a41d [X86][SSE] Fix load folding for (V)CVTDQ2PD
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl

llvm-svn: 297523
2017-03-10 22:35:07 +00:00
Simon Pilgrim bfe263352a [X86] Fix Wunused-lambda-capture warning
llvm-svn: 297521
2017-03-10 22:10:34 +00:00
Eric Christopher f025a89b3c Sink accessing TII to fix release Werror builds.
llvm-svn: 297507
2017-03-10 21:20:17 +00:00
Evandro Menezes 8f70e249a7 [AArch64, X86] Additional debug information for MacroFusion
In order to make it easier to parse information about the performance of
MacroFusion, this patch adds the function and the instruction names to the
debug output of this pass.

llvm-svn: 297504
2017-03-10 20:20:04 +00:00
Simon Pilgrim b02667c469 [APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.

Differential Revision: https://reviews.llvm.org/D30780

llvm-svn: 297458
2017-03-10 13:44:32 +00:00
Simon Pilgrim e86b7e2256 [X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037).
If the constants are already the correct size, we can copy them directly into the shuffle mask.

llvm-svn: 297381
2017-03-09 14:06:39 +00:00
Simon Pilgrim 836bcc689f [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide
By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage.

llvm-svn: 297267
2017-03-08 09:36:39 +00:00
Daniel Sanders 52b4ce727a Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297241
2017-03-07 23:20:35 +00:00
Daniel Sanders 8ebec37d26 Revert r297177: Change LLT constructor string into an LLT-based object ...
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.

Somehow, this change causes the build to need Attributes.gen before it's been
generated.

llvm-svn: 297188
2017-03-07 19:21:23 +00:00
Sanjoy Das c08a79fbf2 [X86] Add option to specify preferable loop alignment
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.

Patch by Serguei Katkov!

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D30391

llvm-svn: 297178
2017-03-07 18:47:22 +00:00
Daniel Sanders 8612326a08 [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297177
2017-03-07 18:32:25 +00:00
Ayman Musa 850fc977c8 [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables.
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.

Differential Revision: https://reviews.llvm.org/D30451

llvm-svn: 297127
2017-03-07 08:11:19 +00:00
Ayman Musa ac5a2c43af [X86][AVX512] Add missing entries to EVEX2VEX tables
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.

Differential Revision: https://reviews.llvm.org/D30501

llvm-svn: 297126
2017-03-07 08:05:53 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Reid Kleckner 812191584f [X86] Fix arg copy elision for illegal types
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.

Fixes PR32136 and re-enables copy elision from i64 arguments.

Reverts the workaround in from r296950.

llvm-svn: 297045
2017-03-06 18:39:39 +00:00
Benjamin Kramer bb635e034c [X86] Silence GCC enum compare warning.
X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional
expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'
[-Werror=enum-compare]

llvm-svn: 296986
2017-03-05 12:53:20 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Sanjay Patel b974be5ef4 [x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in 
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're 
creating ADC/SBB in lots of places where we shouldn't.

This was intended to be an NFC change, but avx-512 has something 
strange going on. It doesn't seem like any of the affected tests 
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...

llvm-svn: 296978
2017-03-04 20:35:19 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Simon Pilgrim 40a0e66b37 [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.

Differential Revision: https://reviews.llvm.org/D30213

llvm-svn: 296966
2017-03-04 12:50:47 +00:00
Matthias Braun 21f340fd25 X86ISelLowering: Only perform copy elision on legal types.
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.

This fixes http://llvm.org/PR32136

llvm-svn: 296950
2017-03-04 01:40:40 +00:00
Sanjay Patel a84fd041c6 [x86] check for commuted add pattern to find ADC/SBB
llvm-svn: 296933
2017-03-04 00:18:31 +00:00
Sanjay Patel 7ee83b41e0 [x86] refactor combineAddOrSubToADCOrSBB(); NFCI
The comments were wrong, and this is not an obvious transform.
This hopefully makes it clearer that we're missing the commuted
patterns for adds. It's less clear that this is actually a good
transform for all micro-arch.

This is prep work for trying to clean up the current adc/sbb 
codegen because it's definitely not happening optimally.

llvm-svn: 296918
2017-03-03 22:35:11 +00:00
Sanjay Patel 58e241896d [x86] clean up materializeSBB(); NFCI
This is producing SBB where it is obviously not necessary, so it needs to be limited.

llvm-svn: 296894
2017-03-03 17:58:39 +00:00
Sanjay Patel e8674825fe [x86] fix formatting; NFC
llvm-svn: 296875
2017-03-03 15:17:41 +00:00
Simon Pilgrim c37a32d2b9 Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation
llvm-svn: 296874
2017-03-03 14:37:57 +00:00
Amjad Aboud 4f97751798 [X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859
2017-03-03 09:03:24 +00:00
Igor Breger 321cf3c650 [GlobalISel][X86] Support float/double and vector types.
Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector.

Reviewers: delena, zvi

Reviewed By: zvi

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30533

llvm-svn: 296856
2017-03-03 08:06:46 +00:00
Simon Pilgrim b3067dc374 [X86][MMX] Fixed i32 extraction on 32-bit targets
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal

llvm-svn: 296782
2017-03-02 18:56:06 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Ayman Musa 9b802e4650 [X86] Fix creating vreg def after use.
llvm-svn: 296601
2017-03-01 10:20:48 +00:00
Daniel Sanders 983c9b98e9 Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.

llvm-svn: 296478
2017-02-28 15:00:27 +00:00
Daniel Sanders a5afdefec6 [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 296474
2017-02-28 14:21:31 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Craig Topper 7502119ce8 [X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector.
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30392

llvm-svn: 296355
2017-02-27 16:15:32 +00:00
Craig Topper 3917ca2af4 [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30390

llvm-svn: 296354
2017-02-27 16:15:30 +00:00
Craig Topper e1be95c3d0 [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation
Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized.

Differential Revision: https://reviews.llvm.org/D30387

llvm-svn: 296353
2017-02-27 16:15:27 +00:00
Craig Topper 53e5a38da9 [X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30386

llvm-svn: 296352
2017-02-27 16:15:25 +00:00
Craig Topper ed0101a0b9 [X86] Check for less than 0 rather than explicit compare with -1. NFC
llvm-svn: 296321
2017-02-27 06:05:30 +00:00
Craig Topper 6028584d8c [X86] Fix execution domain for cmpss/sd instructions.
llvm-svn: 296293
2017-02-26 06:45:59 +00:00
Craig Topper 036693302b [AVX-512] Fix execution domain for scalar commutable min/max instructions.
llvm-svn: 296292
2017-02-26 06:45:56 +00:00
Craig Topper e70231be51 [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.
llvm-svn: 296291
2017-02-26 06:45:54 +00:00
Craig Topper fe25988c68 [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.
llvm-svn: 296290
2017-02-26 06:45:51 +00:00
Craig Topper 49ba3f5406 [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and VPBROADCASTWr_Alt instructions. NFC
llvm-svn: 296289
2017-02-26 06:45:48 +00:00
Craig Topper 6bf9b809ce [AVX-512] Fix execution domain for VPMADD52 instructions.
llvm-svn: 296288
2017-02-26 06:45:45 +00:00
Craig Topper aa8e903150 [AVX-512] Fix the execution domain for VSCALEF instructions.
llvm-svn: 296286
2017-02-26 06:45:40 +00:00
Craig Topper cac5d698df [AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae.
llvm-svn: 296285
2017-02-26 06:45:37 +00:00
Craig Topper ed64904c74 [X86] Fix the execution domain for scalar SQRT intrinsic instruction.
llvm-svn: 296284
2017-02-26 06:45:35 +00:00
Simon Pilgrim 0f5fb5f549 [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296272
2017-02-25 20:01:58 +00:00
Craig Topper 2caa97c891 [AVX-512] Fix the execution domain for scalar FMA instructions.
llvm-svn: 296271
2017-02-25 19:36:28 +00:00
Craig Topper 176f3310b6 [AVX-512] Fix the execution domain on some instructions.
llvm-svn: 296270
2017-02-25 19:18:11 +00:00
Craig Topper d2011e3612 [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS using the scalar register class. We only have patterns for the masked intrinsics.
llvm-svn: 296264
2017-02-25 18:43:42 +00:00
Simon Pilgrim cdf2bd656a Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296147
2017-02-24 18:31:04 +00:00
Simon Pilgrim bd9fb2ae95 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296141
2017-02-24 17:46:18 +00:00
Simon Pilgrim 7f6a7c97a7 [X86][SSE] Target shuffle combine can try to combine up to 16 vectors
Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).

llvm-svn: 296130
2017-02-24 15:35:52 +00:00
Sanjay Patel 9f0fa52aa2 [x86] use DAG.getAllOnesConstant(); NFCI
llvm-svn: 296128
2017-02-24 15:09:59 +00:00
Simon Pilgrim aed352273e [APInt] Add APInt::setBits() method to set all bits in range
The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

llvm-svn: 296102
2017-02-24 10:15:29 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper f2529c188b [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.

llvm-svn: 296091
2017-02-24 05:35:04 +00:00
Petr Hosek a7d5916308 [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

llvm-svn: 296081
2017-02-24 03:10:10 +00:00
Evgeniy Stepanov ee2d77f6d6 Disable TLS for stack protector on Android API<17.
The TLS slot did not exist back then.

llvm-svn: 296014
2017-02-23 21:06:35 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Ayman Musa 524dbdaa2b [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding.
(Quick fix to buildbot fail). 

llvm-svn: 295946
2017-02-23 08:13:36 +00:00
Ayman Musa 6e670cf44f [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.

Differential Revision: https://reviews.llvm.org/D29988

llvm-svn: 295940
2017-02-23 07:24:21 +00:00
Simon Pilgrim 13cdd57964 [X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks.
Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead.

llvm-svn: 295848
2017-02-22 15:38:13 +00:00
Simon Pilgrim 3a895c4873 [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI.
llvm-svn: 295845
2017-02-22 15:04:55 +00:00
Benjamin Kramer 5a7e0f8357 [GlobalISel] Fix compiler warnings and make assert assert something.
llvm-svn: 295827
2017-02-22 12:59:47 +00:00
Igor Breger f7359d893a [X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .

Reviewers: qcolombet, rovka, zvi, ab

Reviewed By: rovka

Subscribers: mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29816

llvm-svn: 295824
2017-02-22 12:25:09 +00:00
Ayman Musa ceea56c705 [X86] Fix memory operands definition for some instructions.
Change integer memory operands to FP memory operands to some FP instructions.

Differential Revision: https://reviews.llvm.org/D30201

llvm-svn: 295813
2017-02-22 08:06:29 +00:00
Craig Topper 56d4022997 [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.

I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.

Differential revision: https://reviews.llvm.org/D30186

llvm-svn: 295810
2017-02-22 06:54:18 +00:00
Evandro Menezes a8d3301ee1 [AArch64, X86] Add statistics for the MacroFusion pass
llvm-svn: 295777
2017-02-21 22:16:13 +00:00
Evandro Menezes b9b7f4b8d3 [AArch64, X86] Guard against both instrs being wild cards
If both instrs are wild cards, the result can be a crash.

llvm-svn: 295776
2017-02-21 22:16:11 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Simon Pilgrim 8eb515d8c4 [X86] EltsFromConsecutiveLoads SDLoc argument should be const&.
There appears never to have been a time that the reference was updated.

llvm-svn: 295739
2017-02-21 17:42:28 +00:00
Simon Pilgrim 791955819c [X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.
As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ.

llvm-svn: 295733
2017-02-21 16:41:44 +00:00
Simon Pilgrim 3546156122 [X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.

llvm-svn: 295723
2017-02-21 15:09:00 +00:00
Igor Breger 812f319794 [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.
Differential Revision: https://reviews.llvm.org/D30189

llvm-svn: 295718
2017-02-21 14:01:25 +00:00
Craig Topper d88389aa7e [X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.

This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.

Reviewers: zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30181

llvm-svn: 295697
2017-02-21 06:39:13 +00:00
Craig Topper 16d9730b86 [X86] Fix formatting. NFC
llvm-svn: 295695
2017-02-21 06:27:13 +00:00
Craig Topper d9fe664868 [AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns.
llvm-svn: 295693
2017-02-21 04:26:10 +00:00
Craig Topper d890db6952 [AVX-512] Fix the ExeDomain for vcmpss/vcmpsd.
llvm-svn: 295691
2017-02-21 04:26:04 +00:00
Sanjoy Das 90208720e3 Add a wrapper around copy_if in STLExtras; NFC
I will add one more use for this in a later change.

llvm-svn: 295685
2017-02-21 00:38:44 +00:00
Craig Topper 2012dda9a0 [AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0.
llvm-svn: 295673
2017-02-20 17:44:09 +00:00
Simon Pilgrim 2967ed1c7e [X86] Tidyup combineExtractVectorElt. NFCI.
Pull out repeated code for extraction index operand and source vector value type.

Use isNullConstant helper to check for zero extraction index.

llvm-svn: 295670
2017-02-20 16:09:45 +00:00
Igor Breger fda32d266a [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. 
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

llvm-svn: 295660
2017-02-20 14:16:29 +00:00
Simon Pilgrim 5910ebe720 [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX
Use v8i64 ASHR instructions if we don't have VLX.

Differential Revision: https://reviews.llvm.org/D28537

llvm-svn: 295656
2017-02-20 12:16:38 +00:00
Ayman Musa 51ffeab8c8 [X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).

Differential Revision: https://reviews.llvm.org/D29876

llvm-svn: 295643
2017-02-20 08:27:54 +00:00
Craig Topper c6c68f5958 [AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0.
llvm-svn: 295640
2017-02-20 07:00:40 +00:00
Craig Topper a5fa2e40f9 [AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns.
llvm-svn: 295638
2017-02-20 07:00:34 +00:00
Craig Topper 5b4e36aafa [AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2.
llvm-svn: 295634
2017-02-20 02:47:42 +00:00
Craig Topper c184b671d9 [X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size.
An earlier commit already did this for the register form.

llvm-svn: 295626
2017-02-20 00:37:23 +00:00
Craig Topper 63801df251 [AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority.
llvm-svn: 295619
2017-02-19 21:44:35 +00:00
Simon Pilgrim 14a7eee0b4 [X86] Use peekThroughOneUseBitcasts helper. NFCI.
llvm-svn: 295618
2017-02-19 21:40:51 +00:00
Davide Italiano 16b476ffcc [X86] Prefer static_cast<> to C-style cast. NFCI.
llvm-svn: 295617
2017-02-19 21:35:41 +00:00
Craig Topper 489057715e [AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own.
llvm-svn: 295616
2017-02-19 21:32:15 +00:00
Simon Pilgrim d590de2998 [X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements.
Replaces existing approach that could only search BUILD_VECTOR nodes.

Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. 

llvm-svn: 295613
2017-02-19 19:40:31 +00:00
Craig Topper 4e794c71a6 [AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0.
This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched.

llvm-svn: 295612
2017-02-19 19:36:58 +00:00
Simon Pilgrim 4271186f9c [X86][SSE] Enable initial support for domain crossing at high shuffle combine depths.
As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths.

We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles.

llvm-svn: 295608
2017-02-19 17:19:38 +00:00
Simon Pilgrim 6d07d514de [X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains.
Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted.

llvm-svn: 295606
2017-02-19 15:15:40 +00:00
Simon Pilgrim b4460cf5a9 [X86][SSE] Add domain crossing support for target shuffle combines.
Add the infrastructure to flag whether float and/or int domains are permitable.

A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors.

llvm-svn: 295604
2017-02-19 14:12:25 +00:00
Craig Topper 218d1a020e [AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch.
The instructions are marked commutable, but without special handling we don't get the immediate correct.

While here also remove the masked memory forms that aren't commutable.

llvm-svn: 295602
2017-02-19 08:03:26 +00:00
Craig Topper 007c93b2b9 [X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type.
llvm-svn: 295589
2017-02-19 02:08:48 +00:00
Craig Topper 06ae5e821c [X86] Tighten up some of the SDNode type constraints.
llvm-svn: 295588
2017-02-19 01:54:47 +00:00
Simon Pilgrim 599b872ca2 [X86] Fix enumeral/non-enumeral conditional expression warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295586
2017-02-19 00:04:30 +00:00
Simon Pilgrim 2f2d8dc630 Fix signed/unsigned comparison warning.
llvm-svn: 295580
2017-02-18 22:56:17 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Simon Pilgrim 7a87eebcad [X86] Fix enumeral/non-enumeral comparison warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295576
2017-02-18 22:40:58 +00:00
Simon Pilgrim 2e78c94ea5 [X86][SSE] Avoid repeated calls to SDValue::getValueType.
Added assertion to check input type of X86ISD::VZEXT during target known bits calculation.

llvm-svn: 295575
2017-02-18 22:25:27 +00:00
Craig Topper de10312bea Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
Clang has now been fixed to not use these intrinsics.

llvm-svn: 295571
2017-02-18 21:50:58 +00:00
Sanjay Patel 12c2093e1e [x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1
This is the same transform that is current used for:
select Bool, 0, -1

llvm-svn: 295568
2017-02-18 21:03:28 +00:00
Craig Topper ba2a726cc6 Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support.

llvm-svn: 295565
2017-02-18 20:14:20 +00:00
Craig Topper 884db3f85d [X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.

llvm-svn: 295564
2017-02-18 19:51:25 +00:00
Craig Topper a505169ca5 [AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to legacy unmasked intrinsics and select instructions.
llvm-svn: 295543
2017-02-18 07:07:50 +00:00
Simon Pilgrim 7db8f42fe3 [X86] Simplify by pulling out valuetype. NFCI.
llvm-svn: 295502
2017-02-17 22:10:10 +00:00
Simon Pilgrim a4c350ff17 [X86][SSE] Add (V)MOVD folding pattern with zextloadi64i32 load node.
Fixes PRPR31309

llvm-svn: 295492
2017-02-17 20:43:32 +00:00
Hans Wennborg 35905d6a67 Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.

llvm-svn: 295357
2017-02-16 19:04:42 +00:00
Andrea Di Biagio 42f7712e23 x86 interrupt calling convention: only save xmm registers if the target supports SSE
The existing code always saves the xmm registers for 64-bit targets even if the
target doesn't support SSE (which is common for kernels). Thus, the compiler
inserts movaps instructions which lead to CPU exceptions when an interrupt
handler is invoked.

This commit fixes this bug by returning a register set without xmm registers
from getCalleeSavedRegs and getCallPreservedMask for such targets.

Patch by Philipp Oppermann.

Differential Revision: https://reviews.llvm.org/D29959

llvm-svn: 295347
2017-02-16 18:25:37 +00:00
Simon Pilgrim 2fe568c95e [X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead.
llvm-svn: 295326
2017-02-16 15:11:49 +00:00
Craig Topper 715873ead3 [AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics.
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.

llvm-svn: 295290
2017-02-16 06:31:54 +00:00
Hans Wennborg a468601e0e [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

llvm-svn: 295262
2017-02-16 00:04:05 +00:00
Simon Pilgrim 5b4c30fb32 [X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing.
Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow.

llvm-svn: 295235
2017-02-15 21:09:00 +00:00
Simon Pilgrim da25d5c7b6 [X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining
Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this.

llvm-svn: 295208
2017-02-15 17:41:33 +00:00
Simon Pilgrim 0f0e5bd3c6 [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs
Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets

llvm-svn: 295169
2017-02-15 11:46:15 +00:00
Ayman Musa b8a4f255dd [X86][AVX] Remove REX_W from AVX instructions.
There is no meaning for REX_W in VEX encoded AVX instruction.

Differential Revision: https://reviews.llvm.org/D29894

llvm-svn: 295157
2017-02-15 08:12:16 +00:00
Craig Topper fbc7805e25 [X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.

As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.

I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.

This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.

Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.

Reviewers: delena, RKSimon, zvi

Reviewed By: zvi

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D28747

llvm-svn: 295155
2017-02-15 06:58:47 +00:00
Craig Topper ec5df5f4aa [AVX-512] Add PACKSS/PACKUS instructions to load folding tables.
llvm-svn: 295154
2017-02-15 06:51:39 +00:00
Diego Novillo 8adfc8ef3a Remove unused variable.
llvm-svn: 295065
2017-02-14 16:39:54 +00:00
Simon Pilgrim 6f732e026d [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs
Add support for specifying an UNPCK input as UNDEF

llvm-svn: 295061
2017-02-14 16:22:04 +00:00
Simon Pilgrim a0878dea9e [X86][SSE] Move unary inputs handling inside matchVectorShuffleWithUNPCK.
llvm-svn: 295053
2017-02-14 13:47:17 +00:00
Simon Pilgrim 3efdffcb27 [X86][SSE] Tidyup matchVectorShuffleWithUNPCK helper function call.
Don't bother setting the V1/V2 operands again for unary shuffles.

Don't bother legalizing the value type unless the match succeeds.

llvm-svn: 295051
2017-02-14 12:54:39 +00:00
Craig Topper d2d50cba2a [AVX-512] Add PAVGB/PAVGW to load folding tables.
llvm-svn: 295035
2017-02-14 06:54:57 +00:00
Andrew Kaylor 709f1c2a9b [X86] Add MXCSR register
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.

Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.

Differential Revision: https://reviews.llvm.org/D29903

llvm-svn: 295004
2017-02-13 23:38:52 +00:00
Simon Pilgrim fd6a84fbaa Fix indentation. NFCI.
llvm-svn: 294959
2017-02-13 15:31:08 +00:00
Simon Pilgrim 828dee1f70 [X86][SSE] Create matchVectorShuffleWithUNPCK helper function.
Currently only used by target shuffle combining - will use it for lowering as well in a future patch.

llvm-svn: 294943
2017-02-13 11:52:58 +00:00
Ayman Musa f77219e035 [X86][AVX512] Fix operand classes for some AVX512 instructions to keep consistency between VEX/EVEX versions of the same instruction.
Differential Revision: https://reviews.llvm.org/D29873

llvm-svn: 294937
2017-02-13 09:55:48 +00:00
Craig Topper 680c73e7ab [X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR to support 512-bit vectors with 128-bit or 256-bit subvectors.
We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors.

llvm-svn: 294931
2017-02-13 04:53:29 +00:00
Craig Topper 53eafa8ea4 [X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR.
This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist.

llvm-svn: 294929
2017-02-12 23:49:46 +00:00
Simon Pilgrim cc9242bd1c [X86] Fix typo in function name. NFCI.
convertBitVectorToUnsiged - convertBitVectorToUnsigned

llvm-svn: 294914
2017-02-12 20:53:44 +00:00
Craig Topper cfe8ce3a58 [AVX-512] Add various EVEX move instructions to load folding tables using the VEX equivalents as a guide.
llvm-svn: 294908
2017-02-12 18:47:46 +00:00
Craig Topper 5971b5488e [AVX-512] Add VMOV64toSDZrm CodeGenOnly instruction based on the same instruction from AVX/SSE.
I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables.

llvm-svn: 294907
2017-02-12 18:47:44 +00:00
Craig Topper ec26801483 [X86] Fix a couple instruction names to use 'mr' instead of 'rm' to indicate they are stores. AVX-512 version was already named with 'mr'.
llvm-svn: 294906
2017-02-12 18:47:40 +00:00
Craig Topper 6eca3170a8 [AVX-512] Add VPEXTRD/Q to load folding tables.
llvm-svn: 294905
2017-02-12 18:47:37 +00:00
Simon Pilgrim 04ec0f2b2a [X86][SSE] Update argument names to match function name. NFCI.
The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently.

llvm-svn: 294900
2017-02-12 16:46:41 +00:00
Simon Pilgrim 4cd841757a [X86][AVX2] Add support for combining target shuffles to VPMOVZX
Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch.

llvm-svn: 294896
2017-02-12 14:31:23 +00:00
Elena Demikhovsky 5d91ab46c0 AVX-512: Fixed DWARF register numbers for XMM16-31
The reference is here: 
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

llvm-svn: 294890
2017-02-12 07:56:50 +00:00
Craig Topper 1c37e991e6 [X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend.
llvm-svn: 294876
2017-02-11 22:57:12 +00:00
Simon Pilgrim 755d9127f5 [X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG
Preparatory step for PR31712

llvm-svn: 294874
2017-02-11 22:47:06 +00:00
Simon Pilgrim 437d64c49e [X86][SSE] Improve VSEXT/VZEXT constant folding.
Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR .

llvm-svn: 294873
2017-02-11 21:55:24 +00:00
Simon Pilgrim 4ef9672f0f [X86][SSE] Add early-out when trying to match blend shuffle. NFCI.
llvm-svn: 294864
2017-02-11 18:06:24 +00:00
Amaury Sechet 58ce15aba1 Fix indentation in X86ISelLowering. NFC
llvm-svn: 294859
2017-02-11 17:48:48 +00:00
Craig Topper 255343483d [AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables.
llvm-svn: 294858
2017-02-11 17:35:28 +00:00
Craig Topper b2fa216dd5 [X86] Improve alphabetizing of load folding tables. NFC
llvm-svn: 294857
2017-02-11 17:35:25 +00:00
Simon Pilgrim 0e6945e48a [X86][SSE] Convert getTargetShuffleMaskIndices to use getTargetConstantBitsFromNode.
Removes duplicate constant extraction code in getTargetShuffleMaskIndices.

getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits.

llvm-svn: 294856
2017-02-11 17:27:21 +00:00
Simon Pilgrim d59fa0e38a [X86] Merge repeated getScalarValueSizeInBits calls. NFCI.
llvm-svn: 294852
2017-02-11 16:42:07 +00:00
Simon Pilgrim 6411a0ebed [X86][3DNow!] Enable PFSUB<->PFSUBR commutation
llvm-svn: 294847
2017-02-11 13:51:14 +00:00
Simon Pilgrim 4ead1d4aa9 [X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRW
All commutations confirmed to give identical results - note PFMAX/PFMIN do not

PFSUB<->PFSUBR should be commutable as well

llvm-svn: 294846
2017-02-11 13:32:55 +00:00
Benjamin Kramer efcf06f5f2 Move symbols from the global namespace into (anonymous) namespaces. NFC.
llvm-svn: 294837
2017-02-11 11:06:55 +00:00
Craig Topper 1f6153bab4 [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.
llvm-svn: 294830
2017-02-11 07:01:40 +00:00
Craig Topper a9818aadab [AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV.
llvm-svn: 294829
2017-02-11 07:01:38 +00:00
Craig Topper 3afa777f10 [AVX-512] Add VPSADBW instructions to load folding tables.
llvm-svn: 294827
2017-02-11 06:24:03 +00:00
Craig Topper 464b8cb244 [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available.
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.

This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.

Overall I think this produces better results in the modified test cases.

llvm-svn: 294824
2017-02-11 05:32:57 +00:00
Ahmed Bougacha 2e275e272f [X86] Bitcast subvector before broadcasting it.
Since r274013, we've been looking through bitcasts on broadcast inputs.
In the scalar-folding case (from a load, build_vector, or sc2vec),
the input type didn't matter, as we'd simply bitcast the resulting
scalar back.

However, when broadcasting a 128-bit-lane-aligned element, we create an
EXTRACT_SUBVECTOR.  Use proper types, by creating an extract_subvector
of the original input type.

llvm-svn: 294774
2017-02-10 19:51:47 +00:00
Simon Pilgrim 8c8b10389d [X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI.
Also reordered an if statement to test low cost comparisons first

llvm-svn: 294748
2017-02-10 14:27:59 +00:00
Simon Pilgrim c371159aac [X86][SSE] Add support for extracting target constants from BUILD_VECTOR
In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet

Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode
llvm-svn: 294746
2017-02-10 14:04:11 +00:00
Simon Pilgrim 1140281413 [X86][SSE] Add missing comment describing combing to SHUFPS. NFCI
llvm-svn: 294745
2017-02-10 13:16:01 +00:00
Igor Breger 6677999e17 add #ifdef, fix compilation error in case LLVM_BUILD_GLOBAL_ISEL=OFF
llvm-svn: 294726
2017-02-10 07:33:14 +00:00
Igor Breger b4442f34cd [X86][GlobalISel] Add general-purpose Register Bank
Summary:
[X86][GlobalISel] Add general-purpose Register Bank.
Add trivial  handling of G_ADD legalization .
Add Regestry Bank selection for COPY and G_ADD  instructions

Reviewers: rovka, zvi, ab, t.p.northover, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29771

llvm-svn: 294723
2017-02-10 07:05:56 +00:00
Eric Christopher 0824096cc0 Temporarily revert "For X86-64 linux and PPC64 linux align int128 to 16 bytes."
until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise
we can't code generate existing bitcode without a string equality data layout.

This reverts commit r294702.

llvm-svn: 294709
2017-02-10 04:35:32 +00:00
Eric Christopher 42b9248803 For X86-64 linux and PPC64 linux align int128 to 16 bytes.
For other platforms we should find out what they need and likely
make the same change, however, a smaller additional change is easier
for platforms we know have it specified	in the ABI. As part of this
rewrite some of the handling in the backends for data layout and update
a bunch of testcases.

Based on a patch by Simonas Kazlauskas!

llvm-svn: 294702
2017-02-10 03:32:21 +00:00
Simon Pilgrim 7f0d7e08b2 [X86] Remove duplicate call to getValueType. NFCI.
llvm-svn: 294640
2017-02-09 22:35:59 +00:00
Peter Collingbourne ef089bdb4b X86: Introduce relocImm-based patterns for cmp.
Differential Revision: https://reviews.llvm.org/D28690

llvm-svn: 294636
2017-02-09 22:02:28 +00:00
Peter Collingbourne d7dd65ad7c X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.

Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.

Differential Revision: https://reviews.llvm.org/D28621

llvm-svn: 294634
2017-02-09 21:58:24 +00:00
Simon Pilgrim e0b5c2acbd Convert to for-range loop. NFCI.
llvm-svn: 294610
2017-02-09 18:52:24 +00:00
Simon Pilgrim 6bf1bd3ed6 [X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode.
llvm-svn: 294596
2017-02-09 17:08:47 +00:00
Pierre Gousseau 6953b32475 [X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math.
In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants.

Differential Revision: https://reviews.llvm.org/D29756

llvm-svn: 294588
2017-02-09 14:43:58 +00:00
Simon Pilgrim 563e23e66e [X86][SSE] Attempt to break register dependencies during lowerBuildVector
LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register.

This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD.

On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.)

Differential Revision: https://reviews.llvm.org/D29720

llvm-svn: 294581
2017-02-09 11:50:19 +00:00
Craig Topper 3cac763532 [X86] Remove the HLE feature flag.
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.

llvm-svn: 294562
2017-02-09 06:51:02 +00:00
Craig Topper 86576bd921 [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by any instructions and not tested.
If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing.

llvm-svn: 294561
2017-02-09 06:50:59 +00:00
Craig Topper 50f3d1452c [X86] Clzero intrinsic and its addition under znver1
This patch does the following.

1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.

Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.

Differential revision: https://reviews.llvm.org/D29385

llvm-svn: 294558
2017-02-09 04:27:34 +00:00
Simon Pilgrim dcd10344a3 [X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI.
Run clang-format and standardized variable names between functions.

llvm-svn: 294456
2017-02-08 14:44:45 +00:00
Craig Topper 3fd463a15a [X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set.
llvm-svn: 294407
2017-02-08 05:45:46 +00:00
Craig Topper 6c05192018 [X86] Remove the VMFUNC feature flag. It was only partially implemented and we have no support for codegening vmfunc instructions today.
If that support ever gets added, the full feature flag support should come along with it.

llvm-svn: 294406
2017-02-08 05:45:42 +00:00
Craig Topper e0ac7f3beb [X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it.
Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction

llvm-svn: 294405
2017-02-08 05:45:39 +00:00
Hans Wennborg 819e3e02a9 [X86] Disable conditional tail calls (PR31257)
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.

llvm-svn: 294348
2017-02-07 20:37:45 +00:00
Sanjay Patel b0cee9b273 [x86] improve comments for SHRUNKBLEND node creation; NFC
llvm-svn: 294344
2017-02-07 19:54:16 +00:00
Sanjoy Das 2f63cbcc0c [ImplicitNullCheck] Extend Implicit Null Check scope by using stores
Summary:
This change allows usage of store instruction for implicit null check.

Memory Aliasing Analisys is not used and change conservatively supposes
that any store and load may access the same memory. As a result
re-ordering of store-store, store-load and load-store is prohibited.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: atrick, llvm-commits

Differential Revision: https://reviews.llvm.org/D29400

llvm-svn: 294338
2017-02-07 19:19:49 +00:00
Sanjay Patel ef6d573f67 [x86] use range-for loops; NFCI
llvm-svn: 294337
2017-02-07 19:18:25 +00:00
Sanjay Patel 633ecbf3c4 [x86] use getSignBit() for clarity; NFCI
llvm-svn: 294333
2017-02-07 19:01:35 +00:00
Simon Pilgrim 8c0f62d293 [X86][SSE] Ensure that vector shift-by-immediate inputs are correctly bitcast to the result type
vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input.

Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them.

llvm-svn: 294308
2017-02-07 14:22:25 +00:00
Craig Topper 9191c3324a [AVX-512] Add masked and unmasked shift by immediate instructions to load folding tables.
llvm-svn: 294287
2017-02-07 07:31:00 +00:00
Craig Topper 62304d80e3 [AVX-512] Add masked shift instructions to load folding tables.
This adds the masked versions of everything, but the shift by immediate instructions.

llvm-svn: 294286
2017-02-07 07:30:57 +00:00
Craig Topper 45d9ddc687 [AVX-512] Add some of the shift instructions to the load folding tables.
This includes unmasked forms of variable shift and shifting by the lower element of a register.

Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.

llvm-svn: 294285
2017-02-07 07:30:54 +00:00
Craig Topper 39d86bb688 [X86] Change the Defs list for VZEROALL/VZEROUPPER back to not including YMM16-31.
llvm-svn: 294277
2017-02-07 04:10:57 +00:00
Eugene Zelenko 90562dfb50 [X86] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.(vlsj-clangbuild)[622]

llvm-svn: 294246
2017-02-06 21:55:43 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Igor Breger 5c31a4c9a3 [X86][GlobalISel] Add limited ret lowering support to the IRTranslator.
Summary:
Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only.
Support argument lowering for float/double types.

Reviewers: t.p.northover, zvi, ab, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D29261

llvm-svn: 294173
2017-02-06 08:37:41 +00:00
Craig Topper 5d9ecd23e8 [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.
llvm-svn: 294170
2017-02-06 05:12:14 +00:00
Craig Topper f0eb60a6f3 [AVX-512] Add VPABSB/D/Q/W to load folding tables.
llvm-svn: 294169
2017-02-06 03:18:01 +00:00
Craig Topper 864b1a5376 [AVX-512] Add VSHUFPS/PD to load folding tables.
llvm-svn: 294168
2017-02-06 03:17:58 +00:00
Craig Topper 75218fb6b1 [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.
llvm-svn: 294164
2017-02-06 01:19:26 +00:00
Craig Topper 452a7770e6 [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables.
llvm-svn: 294163
2017-02-05 23:31:48 +00:00
Simon Pilgrim 380ce75687 [X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector

llvm-svn: 294162
2017-02-05 22:50:29 +00:00
Craig Topper 8eb1f315ac [AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables.
llvm-svn: 294153
2017-02-05 22:25:46 +00:00
Craig Topper cb4bc8be5b [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables.
llvm-svn: 294152
2017-02-05 22:25:42 +00:00
Craig Topper 59af67206d [AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads.
llvm-svn: 294151
2017-02-05 22:25:40 +00:00
Kamil Rytarowski 5d2bd8dd54 Revamp llvm::once_flag to be closer to std::once_flag
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.

This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.

Sponsored by <The NetBSD Foundation>

Reviewers: mehdi_amini, labath, joerg

Reviewed By: mehdi_amini

Subscribers: emaste, clayborg

Differential Revision: https://reviews.llvm.org/D29566

llvm-svn: 294143
2017-02-05 21:13:06 +00:00
Craig Topper cac328f25e [X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument.
llvm-svn: 294132
2017-02-05 18:33:31 +00:00
Craig Topper d7ae9ab1fa [X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage.
llvm-svn: 294131
2017-02-05 18:33:24 +00:00
Craig Topper 6a35a81fc5 [X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later.
Similar was already done for several other shuffles in this function.

The test changes are because the old code used explicity zeroing for elements that could have been undef.

While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.

llvm-svn: 294130
2017-02-05 18:33:14 +00:00
Craig Topper 978fdb75a4 [X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1).
llvm-svn: 294112
2017-02-04 23:26:46 +00:00
Craig Topper 3d95228dbe [X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI
llvm-svn: 294111
2017-02-04 23:26:42 +00:00
Simon Pilgrim 034c1bd32c [X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle.
Correctly flagging upper elements as undef.

llvm-svn: 294020
2017-02-03 17:59:58 +00:00
Craig Topper bbb2b95ce5 [X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering.
llvm-svn: 293969
2017-02-03 00:24:49 +00:00
Eugene Zelenko fbd13c5c12 [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293949
2017-02-02 22:55:55 +00:00
Reid Kleckner 3c467e225e [X86] Avoid sorted order check in release builds
Effectively reverts r290248 and fixes the unused function warning with
ifndef NDEBUG.

llvm-svn: 293945
2017-02-02 22:06:30 +00:00
Craig Topper c45657375b [X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.

llvm-svn: 293944
2017-02-02 22:02:57 +00:00
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Nirav Dave e14300e270 [X86,ISEL] Fix X86 increment chain dependence calculation
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.

llvm-svn: 293892
2017-02-02 14:39:26 +00:00
Simon Pilgrim 20ab6b875a [X86][SSE] Use MOVMSK for all_of/any_of reduction patterns
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).

So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.

Differential Revision: https://reviews.llvm.org/D28810

llvm-svn: 293880
2017-02-02 11:52:33 +00:00
Craig Topper 047a8be18a [X86] Remove some unused DAGCombinerInfo parameters. NFC
llvm-svn: 293873
2017-02-02 08:03:23 +00:00
Craig Topper 94ed54b49a [X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine.
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.

This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.

llvm-svn: 293872
2017-02-02 08:03:20 +00:00
Craig Topper b81e6c48f8 [AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.
llvm-svn: 293862
2017-02-02 04:17:18 +00:00
Peter Collingbourne dc5e583687 X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).
Differential Revision: https://reviews.llvm.org/D28689

llvm-svn: 293844
2017-02-02 00:32:03 +00:00
Simon Pilgrim ca931efc21 [X86][SSE] Remove unused argument. NFCI.
llvm-svn: 293777
2017-02-01 16:34:50 +00:00
Simon Pilgrim 55a9c79bd1 [X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI.
These are identical apart from the extra SSE41 guard for PINSRB.

llvm-svn: 293766
2017-02-01 13:32:19 +00:00
NAKAMURA Takumi 468487d71a *MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation]
llvm-svn: 293744
2017-02-01 07:30:46 +00:00
Craig Topper 0bcba19cdf [X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types.
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.

This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.

llvm-svn: 293743
2017-02-01 07:17:16 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Peter Collingbourne d763c4cc85 MC: Introduce the ABS8 symbol modifier.
@ABS8 can be applied to symbols which appear as immediate operands to
instructions that have a 8-bit immediate form for that operand. It causes
the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8
or R_X86_64_8) for the symbol.

Differential Revision: https://reviews.llvm.org/D28688

llvm-svn: 293667
2017-01-31 18:28:44 +00:00
Nirav Dave a7c041d147 [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648
2017-01-31 17:00:27 +00:00
Simon Pilgrim 1b39d5db7b [X86][SSE] Add support for combining PINSRB into a target shuffle.
llvm-svn: 293637
2017-01-31 14:59:44 +00:00
Benjamin Kramer 94a833962c [X86] Silence unused variable warning in Release builds.
llvm-svn: 293631
2017-01-31 14:13:53 +00:00
Simon Pilgrim 4eab18f6b8 [X86][SSE] Detect unary PBLEND shuffles.
These can appear during shuffle combining.

llvm-svn: 293628
2017-01-31 13:58:01 +00:00
Simon Pilgrim c29eab52e8 [X86][SSE] Add support for combining PINSRW into a target shuffle.
Also add the ability to recognise PINSR(Vex, 0, Idx).

Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.

The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.

llvm-svn: 293627
2017-01-31 13:51:10 +00:00
Craig Topper 2cfa2071bd [AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
2017-01-31 06:49:55 +00:00
Craig Topper 797e32dd98 [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version.
llvm-svn: 293607
2017-01-31 06:49:53 +00:00
Craig Topper 779e4c5bb4 [AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions.
llvm-svn: 293606
2017-01-31 06:49:50 +00:00
Craig Topper 06e038c6de [X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain.
llvm-svn: 293603
2017-01-31 05:18:29 +00:00
Craig Topper e9e84c8284 [AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP.
llvm-svn: 293601
2017-01-31 05:18:24 +00:00
Craig Topper d064cc93b2 [X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite.
llvm-svn: 293592
2017-01-31 02:09:53 +00:00
Craig Topper 85935f69fb [X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves.
llvm-svn: 293591
2017-01-31 02:09:51 +00:00
Craig Topper ced68315ce [X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway.
llvm-svn: 293590
2017-01-31 02:09:49 +00:00
Craig Topper b76494e017 [X86] Remove 'else' after 'return'. NFC
llvm-svn: 293589
2017-01-31 02:09:46 +00:00
Craig Topper f9d901f0ea [X86] Use integer broadcast instructions for integer broadcast patterns.
I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it.

llvm-svn: 293588
2017-01-31 02:09:43 +00:00
Simon Pilgrim 3905e03a47 [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.
Thanks to @mkuper

llvm-svn: 293561
2017-01-30 22:58:44 +00:00
Simon Pilgrim a80a47afef [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.

This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.

llvm-svn: 293560
2017-01-30 22:48:49 +00:00
Eli Friedman 2345733246 Fix line endings.
llvm-svn: 293554
2017-01-30 22:04:23 +00:00
Simon Pilgrim 098998aef0 [X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles
llvm-svn: 293500
2017-01-30 16:58:34 +00:00
Asaf Badouh e11d2d73bf [X86][MCU] Minor bug fix for r293469 + test case
llvm-svn: 293478
2017-01-30 13:14:37 +00:00
Asaf Badouh 53713df0c2 [X86][MCU] replace select with bit manipulation instead of branches
Differential Revision: https://reviews.llvm.org/D28354


 

llvm-svn: 293469
2017-01-30 08:16:59 +00:00
Craig Topper f6df4a6978 [AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.

llvm-svn: 293464
2017-01-30 06:59:06 +00:00
Craig Topper 0265a39472 [AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1.
llvm-svn: 293458
2017-01-30 05:37:47 +00:00
Craig Topper 3b7e823f92 [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
2017-01-30 00:06:01 +00:00
Chris Ray 30b3fafb94 [X86][Disassembler] Added SALC instruction
Reviewers: joe.abbey, craig.topper

Reviewed By: craig.topper

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D29201

llvm-svn: 293447
2017-01-29 23:02:47 +00:00
Craig Topper db919caf1b [AVX-512] Fix lowering for mask register concatenation with undef in the lower half.
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.

llvm-svn: 293446
2017-01-29 22:53:33 +00:00
Chris Ray ba3741cb2b [X86] Fixing flag usage for RCL and RCR
Summary: The RCL and RCR instructions use the carry flag.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29237

llvm-svn: 293441
2017-01-29 20:05:30 +00:00
Simon Pilgrim 76073f8d22 [X86][SSE] Lower scalar_to_vector(0) to zero vector
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.

The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.

Differential Revision: https://reviews.llvm.org/D29097

llvm-svn: 293438
2017-01-29 18:13:37 +00:00
Elena Demikhovsky 17fe27f1f2 [X86 Codegen] Fixed a bug in unsigned saturation
PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern.
AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned.

See https://llvm.org/bugs/show_bug.cgi?id=31773

Differential Revision: https://reviews.llvm.org/D29196

llvm-svn: 293431
2017-01-29 13:18:30 +00:00
Igor Breger 9ea154d4ad [X86][GlobalISel] Add limited argument lowering support to the IRTranslator.
Summary:
Add limited (i8/i16/i32/i64)  argument lowering support to the IRTranslator.
Inspired by commit 289940.

Reviewers: t.p.northover, qcolombet, ab, zvi, rovka

Reviewed By: rovka

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28987

llvm-svn: 293427
2017-01-29 08:35:42 +00:00
Craig Topper 6533e40e9d [X86] Fix vector ANDN matching to work correctly when both inputs to the AND are XORs.
llvm-svn: 293403
2017-01-28 23:52:09 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Chris Ray 535e7d1547 [X86] Adding FFREEP instruction.
Summary: Small change to get the FREEP instruction to decode properly.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29193

llvm-svn: 293314
2017-01-27 18:02:53 +00:00
Simon Pilgrim 027bb453d9 [X86][SSE] Add support for combining ANDNP byte masks with target shuffles
llvm-svn: 293178
2017-01-26 14:31:12 +00:00
Simon Pilgrim 3057fd53f9 [X86][SSE] Pull out target shuffle resolve code into helper. NFCI.
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.

llvm-svn: 293175
2017-01-26 13:06:02 +00:00
Craig Topper bad53cce26 [AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it.
llvm-svn: 293157
2017-01-26 07:17:58 +00:00
Craig Topper f0bab7b739 [X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand.
llvm-svn: 293156
2017-01-26 07:17:53 +00:00
Jonas Paulsson 8e2f948ef0 [TargetTransformInfo] Refactor and improve getScalarizationOverhead()
Refactoring to remove duplications of this method.

New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.

This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().

Review: Hal Finkel
https://reviews.llvm.org/D29017

llvm-svn: 293155
2017-01-26 07:03:25 +00:00
Mohammed Agabaria 20caee95e1 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Coby Tayree 77807d93af [X86]Enable the use of 'mov' with a 64bit GPR and a large immediate
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer

Differential Revision: https://reviews.llvm.org/D28988

llvm-svn: 293030
2017-01-25 07:09:42 +00:00
Simon Pilgrim 893d2119ee [X86][AVX512] Remove unused argument from PMOVX tablegen patterns. NFCI.
Seems to be a copy+paste legacy from the AVX2 patterns.

llvm-svn: 292941
2017-01-24 16:16:29 +00:00
Martin Bohme 526299c81c [X86][SSE] Add explicit braces to avoid -Wdangling-else warning.
Reviewers: RKSimon

Subscribers: llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D29076

llvm-svn: 292924
2017-01-24 12:31:30 +00:00
Simon Pilgrim 0c45338961 Fix unused variable warning
llvm-svn: 292921
2017-01-24 11:54:27 +00:00
Simon Pilgrim e1ec9072f6 [X86][SSE] Add support for constant folding vector arithmetic shift by immediates
llvm-svn: 292919
2017-01-24 11:46:13 +00:00
Simon Pilgrim 6340e54861 [X86][SSE] Add support for constant folding vector logical shift by immediates
llvm-svn: 292915
2017-01-24 11:21:57 +00:00
Craig Topper fc8798fa1b [X86] Remove unnecessary peakThroughBitcasts call that's already take care of by the ISD::isBuildVectorAllOnes check below.
llvm-svn: 292894
2017-01-24 06:57:29 +00:00
Craig Topper b0cbd5b5b0 [AVX-512] Simplify multiclasses for integer logic operations. There were several inputs that didn't vary.
While there give them the same scheduling itinerary as the SSE/AVX versions.

llvm-svn: 292892
2017-01-24 06:25:34 +00:00
Craig Topper 993edc9db1 [X86] Don't split v8i32 all ones values if only AVX1 is available. Keep it intact and split it at isel.
This allows us to remove the check in ANDN combining that had to look through the extraction.

llvm-svn: 292881
2017-01-24 04:33:03 +00:00
Craig Topper eb440a14a5 [X86] Remove Undef handling from extractSubVector. This is now handled inside getNode.
llvm-svn: 292877
2017-01-24 02:43:54 +00:00
Simon Pilgrim 0218ce1080 [X86][SSE] Add missing X86ISD::ANDNP combines.
llvm-svn: 292767
2017-01-22 22:45:23 +00:00
Simon Pilgrim 7e1cc97513 [X86][SSE] Improve shuffle combining with zero insertions
Add support for handling shuffles with scalar_to_vector(0)

llvm-svn: 292766
2017-01-22 22:21:44 +00:00
Sanjay Patel 8f49aede82 [x86] avoid crashing with illegal vector type (PR31672)
https://llvm.org/bugs/show_bug.cgi?id=31672

llvm-svn: 292758
2017-01-22 17:06:12 +00:00
Craig Topper 8e0724d332 [X86] Don't allow commuting to form phsub operations.
Fixes PR31714.

llvm-svn: 292713
2017-01-21 06:59:38 +00:00
Simon Pilgrim 3e5b525699 Remove trailing whitespace. NFCI.
llvm-svn: 292613
2017-01-20 15:15:59 +00:00
Simon Pilgrim 0da4d2bc03 [CostModel][X86] Removed unused cost. NFCI.
SHL v8i32 is already handled in the SSE41 cost table

llvm-svn: 292612
2017-01-20 15:14:38 +00:00
Simon Pilgrim db101e4d57 [X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI.
llvm-svn: 292502
2017-01-19 18:18:32 +00:00
Simon Pilgrim 5f2f53b106 [X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further

llvm-svn: 292493
2017-01-19 16:25:02 +00:00
Elena Demikhovsky e01512cecf Recommiting unsigned saturation with a bugfix.
A test case that crached is added to avx512-trunc.ll.
(PR31589)

llvm-svn: 292479
2017-01-19 12:08:21 +00:00
Craig Topper 200ea31684 [AVX-512] Support ADD/SUB/MUL of mask vectors
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.

We already do this for scalar i1 operations so I just extended it to vectors of i1.

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D28888

llvm-svn: 292474
2017-01-19 07:12:35 +00:00
Craig Topper c227529105 [X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical.
llvm-svn: 292469
2017-01-19 03:49:29 +00:00
Craig Topper b561e66384 [AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load.
llvm-svn: 292466
2017-01-19 02:34:29 +00:00
Michael Kuperstein d3d2925933 Revert r291670 because it introduces a crash.
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.

PR31589 has the new reproducer.

llvm-svn: 292444
2017-01-18 23:05:58 +00:00
Kirill Bobyrev 6afbaf0944 Revert 292404 due to buildbot failures.
llvm-svn: 292407
2017-01-18 16:34:25 +00:00
Kirill Bobyrev 9ad06dbe17 [X86] Minor code cleanup to fix several clang-tidy warnings. NFC
llvm-svn: 292404
2017-01-18 16:15:47 +00:00
Michael Zuckerman 0c0240ce84 [X86] Improve mul combine for negative multiplayer (2^c - 1)
This patch improves the mul instruction combine function (combineMul) 
by adding new layer of logic. 
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) 
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.

Differential Revision: https://reviews.llvm.org/D28232

llvm-svn: 292358
2017-01-18 09:31:13 +00:00
Marina Yatsina 197db00e3e [X86] Fix for bugzilla 31576 - add support for "data32" instruction prefix
This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576).

"data32" instruction prefix was not defined in the llvm.
An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes).

Differential Revision: https://reviews.llvm.org/D28468

llvm-svn: 292352
2017-01-18 08:07:51 +00:00
Joerg Sonnenberger 270dd41f75 Remove an overeager assert from r288844.
llvm-svn: 292244
2017-01-17 19:29:15 +00:00
Bob Wilson f2d0b68b3b Revert r291640 change to fold X86 comparison with atomic_load_add.
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.

llvm-svn: 292242
2017-01-17 19:18:57 +00:00
Craig Topper 729d30d0ae [AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation.
llvm-svn: 292201
2017-01-17 06:49:59 +00:00
Craig Topper fba613e407 [X86] Merge the disassemblers handling of the different TYPE_RELs by getting the size information from the ENCODING field. NFCI
llvm-svn: 292096
2017-01-16 06:49:09 +00:00
Craig Topper ad944a1cac [X86] Reduce the number of operand 'types' the disassembler needs to deal with. NFCI
We were frequently checking for a list of types and the different types
conveyed no real information. So lump them together explicitly.

llvm-svn: 292095
2017-01-16 06:49:03 +00:00
Craig Topper 3173a1f8ff [AVX-512] Teach the disassembler about all of the EVEX gather and scatter instructions.
llvm-svn: 292094
2017-01-16 05:44:33 +00:00
Craig Topper 33ac064137 [AVX-512] Begin giving the disassembler a way to recognize that VSIB is a different encoding than regular addressing modes.
This part first teaches it not to check error if EVEX.V2 is used by a VSIB instruction.

llvm-svn: 292093
2017-01-16 05:44:25 +00:00
Craig Topper 7dfd583644 [AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD
with ZMM index. Similar for SCATTER and the prefetch gather and scatter
instructions.

Fixes PR31618.

llvm-svn: 292088
2017-01-16 00:55:58 +00:00
Craig Topper 8be6ebce2b [AVX-512] Fix register class in one of the gather/scatter memory operands so that all 32 bit registers can be allowed.
llvm-svn: 292087
2017-01-16 00:55:50 +00:00
Simon Pilgrim 6ed996cdf0 [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types
We already have patterns in place to support 128/256-bit shifts without AVX512VL

llvm-svn: 292077
2017-01-15 20:44:00 +00:00
Michael Zuckerman 6baa3838e9 Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE.
llvm-svn: 292066
2017-01-15 16:43:14 +00:00
Craig Topper f1388ef006 [AVX-512] Remove unnecessary duplicate broadcast patterns. NFC
llvm-svn: 292053
2017-01-15 06:15:45 +00:00
Craig Topper 52317e8b6e [AVX-512] Replicate some broadcast patterns to VLX and disable the AVX2 patterns when VLX is available.
llvm-svn: 292051
2017-01-15 05:47:45 +00:00
Craig Topper c294cff863 [X86] Remove untested MOVDDUP patterns.
These all involve bitcasts around the memory operands. This isn't
something we normally do for isel patterns. I suspect DAG combine should
convert the load type making this unnecessary.

llvm-svn: 292050
2017-01-15 05:21:29 +00:00
Simon Pilgrim d419b73a42 [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed
llvm-svn: 292023
2017-01-14 19:24:23 +00:00
Simon Pilgrim 8e5ecf8ad1 [X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns
VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator

llvm-svn: 292021
2017-01-14 18:52:13 +00:00
Simon Pilgrim b290805e94 [X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns
VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator

llvm-svn: 292020
2017-01-14 18:08:54 +00:00
Simon Pilgrim a1631749f8 [X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns
VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages

llvm-svn: 292019
2017-01-14 17:13:52 +00:00
Craig Topper 63e2cd6caa [AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial.
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.

This also picks up cases where the masked move was created due to a masked load intrinsic.

Differential Revision: https://reviews.llvm.org/D28454

llvm-svn: 292005
2017-01-14 07:50:52 +00:00
Craig Topper 09b7e0f01d [AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible.
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.

This makes it possible for the register allocator to have all 32 registers available to work with.

llvm-svn: 292004
2017-01-14 07:29:24 +00:00
Craig Topper 9cc685a56e [X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop.
llvm-svn: 291996
2017-01-14 04:29:15 +00:00
Craig Topper 9850210d03 [AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there.
This fixes a undefined sanitizer failure from r291888.

llvm-svn: 291994
2017-01-14 04:19:35 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Simon Pilgrim 7f2a6d5e8c [X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX
Use v8i64 variable ASHR instructions if we don't have VLX.

This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll.

Differential Revision: https://reviews.llvm.org/D28604

llvm-svn: 291901
2017-01-13 13:16:19 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Michael Zuckerman 558a4d8419 [X86][AVX512] Adding missing shuffle lowering to blend mask instructions
Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.

Reviewers:
1. craig.topper
2. guyblank
3. RKSimon
4. igorb     

Differential Revision: https://reviews.llvm.org/D28483

llvm-svn: 291888
2017-01-13 09:06:00 +00:00
Craig Topper 1ec84c2a18 [AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table.
These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch.

llvm-svn: 291885
2017-01-13 07:28:56 +00:00
Craig Topper 46b6ecf41e [X86] Move some entries in the load folding tables to move appropriate grouping. NFC
llvm-svn: 291884
2017-01-13 07:28:53 +00:00
Nikolai Bozhenov f02ac0eeb2 [X86] Replace AND+IMM64 with SRL/SHL
Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result
is unused and the mask has only higher/lower bits set. For example, with
this patch LLVM emits

  shrq $41, %rdi
  je

instead of

  movabsq $0xFFFFFE0000000000, %rcx
  testq   %rcx, %rdi
  je

This reduces number of instructions, code size and register pressure.
The transformation is applied only for cases where the mask cannot be
encoded as an immediate value within TESTQ instruction.

Differential Revision: https://reviews.llvm.org/D28198

llvm-svn: 291806
2017-01-12 19:54:27 +00:00
Nikolai Bozhenov 6bdf92cec7 [X86] Tune bypassing of slow division for Intel CPUs
64-bit integer division in Intel CPUs is extremely slow, much slower
than 32-bit division. On the other hand, 8-bit and 16-bit divisions
aren't any faster. The only important exception is Atom where DIV8
is fastest. Because of that, the patch
1) Enables bypassing of 64-bit division for Atom, Silvermont and
   all big cores.
2) Modifies 64-bit bypassing to use 32-bit division instead of
   16-bit one. This doesn't make the shorter division slower but
   increases chances of taking it. Moreover, it's much more likely
   to prove at compile-time that a value fits 32 bits and doesn't
   require a run-time check (e.g. zext i32 to i64).

Differential Revision: https://reviews.llvm.org/D28196

llvm-svn: 291800
2017-01-12 19:34:15 +00:00
Craig Topper 24c3a2395f [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support.
llvm-svn: 291747
2017-01-12 06:49:12 +00:00
Craig Topper 69ab67b279 [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq.
llvm-svn: 291746
2017-01-12 06:49:08 +00:00
Elad Cohen c5ba925ef2 [X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask
r289653 added a case where `vselect <cond> <vector1> <all-zeros>`
is transformed to:
`vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>`
This was not aimed to catch cases where Cond is not a vXi1
mask but it does. Moreover, when Cond type is VxiN (N > 1)
then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond).
This patch changes the above to xor with allones, and avoids
entering the case for non-mask Conds.

llvm-svn: 291745
2017-01-12 06:49:03 +00:00
Peter Collingbourne 1b5f1cfdb4 X86: Remove dead code. NFC.
llvm-svn: 291721
2017-01-11 23:00:28 +00:00
Simon Pilgrim 0c1faf432b Remove trailing whitespace. NFCI.
llvm-svn: 291680
2017-01-11 16:38:20 +00:00
Elena Demikhovsky 9d0e7c33d3 X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
And VPACKUS* instructions on SEE* targets.

Differential Revision: https://reviews.llvm.org/D28216

llvm-svn: 291670
2017-01-11 12:59:32 +00:00
Simon Pilgrim 5a81fefad3 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

llvm-svn: 291665
2017-01-11 10:36:51 +00:00
Elad Cohen 0c2601073e [X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}
The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd,
(v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes
redundant (v)movss/(v)movsd instructions. This patch adds patterns for
optimizing these sequences.

Differential revision: https://reviews.llvm.org/D28455

llvm-svn: 291660
2017-01-11 09:11:48 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Hans Wennborg 6573976f57 Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
This was reverted because it would miscompile code where the cmp had
multiple uses. That was due to a deficiency in the existing code, which
was fixed in r291630 (see the PR for details).

This re-commit includes an extra test for the kind of code that got
miscompiled: @test_sub_1_setcc_jcc.

llvm-svn: 291640
2017-01-11 01:36:57 +00:00
Hans Wennborg 12de693747 [X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses
We would miscompile the following:

  void g(int);
  int f(volatile long long *p) {
    bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0;
    g(b ? 12 : 34);
    return b ? 56 : 78;
  }

into

  pushq   %rax
  lock            incq    (%rdi)
  movl    $12, %eax
  movl    $34, %edi
  cmovlel %eax, %edi
  callq   g(int)
  testq   %rax, %rax   <---- Bad.
  movl    $56, %ecx
  movl    $78, %eax
  cmovsl  %ecx, %eax
  popq    %rcx
  retq

because the code failed to take into account that the cmp has multiple
uses, replaced one of them, and left the other one comparing garbage.

llvm-svn: 291630
2017-01-11 00:49:54 +00:00
Michael Zuckerman bcd03e7f3b [X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions
This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351

1.  This patch adds new type of shuffle lowering
2.  We can use the expand instruction, When the shuffle pattern is as following:
    { 0*a[0]0*a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}.

Reviewers: 1. igorb  
           2. guyblank  
           3. craig.topper  
           4. RKSimon 

Differential Revision: https://reviews.llvm.org/D28352

llvm-svn: 291584
2017-01-10 18:57:17 +00:00
Craig Topper d55b83128b AMD family 17h (znver1) enablement
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.

This item is linked to clang review item https://reviews.llvm.org/D28018

Patch by Ganesh Gopalasubramanian

Reviewers: RKSimon, craig.topper

Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits

Differential Revision: https://reviews.llvm.org/D28017

llvm-svn: 291543
2017-01-10 06:01:16 +00:00
Craig Topper 2ed461e5c4 [X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails.
Fixes PR31593.

llvm-svn: 291535
2017-01-10 04:12:24 +00:00
Michael Kuperstein 1559e8863e Revert r291092 because it introduces a crash.
See PR31589 for details.

llvm-svn: 291478
2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov d497d36083 X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.
Differential Revision: https://reviews.llvm.org/D28087

llvm-svn: 291473
2017-01-09 20:26:17 +00:00
Simon Pilgrim 0f23b2ba1a [X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.

Cost model updates to follow.

llvm-svn: 291451
2017-01-09 17:20:03 +00:00
Simon Pilgrim d990cd371b [X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).

Cost model updates to follow.

llvm-svn: 291445
2017-01-09 15:15:45 +00:00
Craig Topper 96ab6fd2eb [AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation.
llvm-svn: 291419
2017-01-09 04:19:34 +00:00
Craig Topper 6393afce97 [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.

llvm-svn: 291415
2017-01-09 02:44:34 +00:00
Craig Topper f51ba1e3da [AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX.
llvm-svn: 291402
2017-01-08 21:32:30 +00:00
Simon Pilgrim e6d948b857 Strip trailing whitespace.
llvm-svn: 291395
2017-01-08 18:37:42 +00:00
Simon Pilgrim b2a80950fe Fix line endings and strip trailing whitespace.
llvm-svn: 291393
2017-01-08 16:45:39 +00:00
Sanjay Patel bf51c8a975 [x86] fix usage of stale operands when lowering select
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.

The debug output shows nodes like this:

t4: i32 = xor t2, Constant:i32<-1>
    t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
  t14: i32 = select t21, t4, Constant:i32<-1>

And because the select is holding onto the t4 (xor) node while EmitTest creates a new 
x86-specific xor node, the lowering results in:

  t4: i32 = xor t2, Constant:i32<-1>
  t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1

Differential Revision: https://reviews.llvm.org/D28374

llvm-svn: 291392
2017-01-08 15:53:40 +00:00
Simon Pilgrim 9c58950eeb [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim 1fa5487c05 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Craig Topper 5c46c7526e [AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC
llvm-svn: 291383
2017-01-08 05:46:21 +00:00
Simon Pilgrim 9681c407b4 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Craig Topper a74e3088df [AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions.
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.

llvm-svn: 291371
2017-01-07 22:20:34 +00:00
Craig Topper 81f20aa336 [AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions.
All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP.

We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation.

llvm-svn: 291369
2017-01-07 22:20:26 +00:00
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Craig Topper 42b848a683 [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
2017-01-07 06:56:54 +00:00
Simon Pilgrim 3128d6b520 [X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI.
Early step towards ignoring domain above a certain shuffle depth.

llvm-svn: 291248
2017-01-06 17:34:30 +00:00
Simon Pilgrim bd3c6824d4 [X86][SSE] Simplify float domain requirement in unary shuffle matching.
The AVX1-only limit is never actually required in matchUnaryVectorShuffle

llvm-svn: 291244
2017-01-06 17:00:59 +00:00
Simon Pilgrim a08d7b9913 Remove trailing whitespace. NFCI.
llvm-svn: 291240
2017-01-06 15:31:52 +00:00
Simon Pilgrim 9b8c7caf4e [X86] Add X86Subtarget argument. NFCI.
All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it.

llvm-svn: 291239
2017-01-06 15:29:17 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Sanjay Patel dea5a7bd53 less braces; NFC
llvm-svn: 291126
2017-01-05 16:47:32 +00:00
Simon Pilgrim bca02f9e20 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Zvi Rackover 4b7d724d62 [X86] Optimize vector shifts with variable but uniform shift amounts
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.

Reviewers: craig.topper, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28353

llvm-svn: 291120
2017-01-05 15:11:43 +00:00
Simon Pilgrim a62395a4bd [CostModel][X86] Pulled out common type legalization code
llvm-svn: 291109
2017-01-05 14:33:32 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Craig Topper 48d232d3e7 [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
2017-01-03 07:36:41 +00:00
Craig Topper 785e58fdc9 [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
2017-01-03 07:36:39 +00:00
Craig Topper 9496e3f916 [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts.
llvm-svn: 290870
2017-01-03 07:00:40 +00:00
Craig Topper fa875a1d3d [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions.
llvm-svn: 290869
2017-01-03 05:46:18 +00:00
Craig Topper be9ef55152 [X86] Remove trailing whitespace and an unnecessary line wrap. NFC
llvm-svn: 290867
2017-01-03 05:46:06 +00:00
Craig Topper 06bae884bd [X86] Fix header comment. NFC
llvm-svn: 290866
2017-01-03 05:46:05 +00:00
Craig Topper c849172105 [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation.
llvm-svn: 290865
2017-01-03 05:46:02 +00:00
Craig Topper 0cda8bbf74 [AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
2017-01-03 05:45:57 +00:00
Craig Topper 4d47c6ae57 [AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.

llvm-svn: 290863
2017-01-03 05:45:46 +00:00
Dean Michael Berris f7e7b938ea [XRay] Merge instrumentation point table emission code into AsmPrinter.
Summary:
No need to have this per-architecture.  While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards.  Individual entry emission code goes to the
entry's own class.

Fully tested on amd64, cross-builds on both ARMs and PowerPC.

Reviewers: dberris

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28209

llvm-svn: 290858
2017-01-03 04:30:21 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00
Craig Topper 89b3e0223f [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.

llvm-svn: 290573
2016-12-27 03:46:05 +00:00
Craig Topper 83f2145c18 [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions.
llvm-svn: 290564
2016-12-27 01:56:24 +00:00
Craig Topper 5ef13ba18b [AVX-512] Fix some patterns to use extended register classes.
llvm-svn: 290536
2016-12-26 07:26:07 +00:00
Craig Topper f56d985f77 [AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will.
A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering.

llvm-svn: 290532
2016-12-26 01:40:17 +00:00
Michael Zuckerman 86602e85dd revert commit 290516
llvm-svn: 290517
2016-12-25 12:45:18 +00:00
Michael Zuckerman 45aa420640 Commit try added new empty line
llvm-svn: 290516
2016-12-25 12:01:34 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Ayman Musa 9ff608cdc6 [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.

Differential Revision: https://reviews.llvm.org/D28024

llvm-svn: 290333
2016-12-22 08:42:46 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
Michael Zuckerman 85e12d2851 revert first commit . removing empty line in X86.h
llvm-svn: 290255
2016-12-21 12:48:01 +00:00
Michael Zuckerman 58838cf29d First commit adding new line to X86.h
llvm-svn: 290254
2016-12-21 12:44:47 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00