Commit Graph

8129 Commits

Author SHA1 Message Date
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Simon Pilgrim 3125501bba [X86][AVX] Improved AVX512F/AVX512VL SubVectorBroadcast tests
llvm-svn: 279736
2016-08-25 12:50:13 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
Matthias Braun f1b20c5225 MachineRegisterInfo/MIR: Initialize tracksSubRegLiveness early, do not print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().

llvm-svn: 279676
2016-08-24 22:17:45 +00:00
Simon Pilgrim e14653e17d [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Simon Pilgrim 2725217680 [X86][SSE] Regenerate scalar math load folding tests for 32 and 64 bit targets
llvm-svn: 279630
2016-08-24 15:07:11 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 3c8cd3df5e [X86][F16C] Regenerated f16c tests
llvm-svn: 279621
2016-08-24 11:56:15 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Matthias Braun 79f85b3b8f MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Simon Pilgrim 95580d6ed2 [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2dq & (v)cvttpd2dq intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279527
2016-08-23 16:11:21 +00:00
Simon Pilgrim 04b99fcded [X86][AVX] Updated fptosi_2f64_to_4i32 test to show missed opportunity to implicit zero the upper elements
llvm-svn: 279521
2016-08-23 15:10:39 +00:00
Simon Pilgrim 22c415a696 [X86][AVX] Add v2i32 fp to int conversion tests
llvm-svn: 279520
2016-08-23 15:00:52 +00:00
Simon Pilgrim cb96142fb8 [X86][AVX] Add AVX2/AVX512 fp to int conversion tests
llvm-svn: 279518
2016-08-23 14:37:35 +00:00
Simon Pilgrim 2ed547513d [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279511
2016-08-23 11:26:28 +00:00
Simon Pilgrim 9eb978b47b [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps implicitly zeroes the upper half of the xmm
llvm-svn: 279508
2016-08-23 10:35:24 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Guy Blank 9ae797a798 [AVX512][FastISel] Do not use K registers in TEST instructions
In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal.
Changed to using KORTEST when dealing with K regs.

Differential Revision: https://reviews.llvm.org/D23163

llvm-svn: 279393
2016-08-21 08:02:27 +00:00
Simon Pilgrim 636422a898 [X86][SSE] Regenerate 32-bit buildvector test
llvm-svn: 279389
2016-08-20 23:09:57 +00:00
Simon Pilgrim ead5076753 [X86][SSE] Regenerate subvector extraction widening test
llvm-svn: 279388
2016-08-20 22:00:53 +00:00
Simon Pilgrim b65d476549 [X86] Regenerate fp truncate tests
llvm-svn: 279387
2016-08-20 21:56:33 +00:00
Simon Pilgrim 8275583be6 Regenerate test
llvm-svn: 279386
2016-08-20 21:37:30 +00:00
Simon Pilgrim a1142579f1 Regenerate test
llvm-svn: 279385
2016-08-20 21:35:45 +00:00
Simon Pilgrim ae9b81e684 [X86][XOP] Tweak vpermil2pd test to stop it being combined away
llvm-svn: 279384
2016-08-20 21:07:41 +00:00
Simon Pilgrim cb0ba1067f [X86][SSE] Added vector interleave test (PR21281)
llvm-svn: 279375
2016-08-20 17:07:38 +00:00
Simon Pilgrim d7a3782ae4 [X86][SSE] Generalised combining to VZEXT_MOVL to any vector size
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.

llvm-svn: 279273
2016-08-19 17:02:00 +00:00
Simon Pilgrim f1b8fdc074 [X86][SSE] Add support for matching commuted insertps patterns
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match

llvm-svn: 279230
2016-08-19 10:31:53 +00:00
Dean Michael Berris 1dd1ca9727 [XRay] Synthesize a reference to the xray_instr_map
Without the synthesized reference to a symbol in the xray_instr_map,
linker section garbage collection will helpfully remove the whole
xray_instr_map section from the final executable (or archive). This will
cause the runtime to not be able to identify the sleds and hot-patch the
calls/jumps into the runtime trampolines.

This change adds a reference from the text section at the end of the
function to keep around the associated xray_instr_map section as well.

We also make sure that we catch this reference in the test.

Reviewers: chandlerc, echristo, majnemer, mehdi_amini

Subscribers: mehdi_amini, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D23398

llvm-svn: 279204
2016-08-19 04:44:30 +00:00
Simon Pilgrim 99fd9c5f56 [X86][SSE] Missed insertps shuffle patterns
llvm-svn: 279111
2016-08-18 18:19:28 +00:00
Simon Pilgrim ab7c46eccf [X86][SSE] Add SSE1 tests to make sure we don't merge loads on illegal types
llvm-svn: 279065
2016-08-18 13:41:26 +00:00
Matthias Braun c9130ea6a3 Testcase for r279022
llvm-svn: 279031
2016-08-18 02:21:54 +00:00
Marina Yatsina 53ce3f9d02 Fix for PR29010
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=29010
Root cause of the bug is that the register class of the machine instruction operand does not fully reflect if this registers that can be allocated.
Both for i386 and x86_64 the operand's register class is VR128RegClass and thus contains xmm0-xmm15, though in i386 we can only use xmm0-xmm8.
In order to get the actual allocable registers of the class we need to use RegisterClassInfo.

Differential Revision: https://reviews.llvm.org/D23613

llvm-svn: 278954
2016-08-17 19:07:40 +00:00
Ayman Musa 71b43c5c1d Fix bug in DAGBuilder for getelementptr with expanded vector.
Replacing the usage of MVT with EVT in case the vector type is expanded.
Differential Revision: https://reviews.llvm.org/D23306

llvm-svn: 278913
2016-08-17 07:52:15 +00:00
Sanjay Patel 904cd39b05 [x86] Allow merging multiple instances of an immediate within a basic block for code size savings, for 64-bit constants.
This patch handles 64-bit constants which can be encoded as 32-bit immediates.

It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants.

Patch by Sunita Marathe!

Differential Revision: https://reviews.llvm.org/D23391

llvm-svn: 278857
2016-08-16 21:35:16 +00:00
Haicheng Wu 9780df5385 [BranchFolding] Change a test case of r278575.
Rename the operands to make the test less brittle.

llvm-svn: 278841
2016-08-16 20:06:25 +00:00
Sjoerd Meijer 15c81b05ea [MBP] do not reorder and move up loop latch block
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra 
unconditional branch.

Differential Revision: https://reviews.llvm.org/D22521

llvm-svn: 278840
2016-08-16 19:50:33 +00:00
Wei Mi db68c9adbd Remove a stale comment from the test, NFC.
llvm-svn: 278821
2016-08-16 16:57:15 +00:00
Simon Pilgrim 25d2506029 [X86][AVX] Fixed typo in zero element insertion
llvm-svn: 278798
2016-08-16 13:33:33 +00:00
Simon Pilgrim cc316f013a [X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations
The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion

llvm-svn: 278794
2016-08-16 12:52:06 +00:00
Simon Pilgrim d2d3202532 [X86][AVX512BW] Updated tests to demonstrate AVX512BW's inability to vectorize v64i8 shifts
llvm-svn: 278790
2016-08-16 11:05:47 +00:00
Simon Pilgrim f16cd361d4 [X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations
llvm-svn: 278787
2016-08-16 10:03:23 +00:00
Guy Blank 722caebdae [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Igor Breger 505f2cc468 [AVX512] Fix VFPCLASSSD/VFPCLASSSS intrinsic lowering. The i1 result should be zero extended according to SPEC.
Differential Revision: http://reviews.llvm.org/D23489

llvm-svn: 278626
2016-08-14 13:58:57 +00:00
Igor Breger 6fc00b0acf autogenerate checks
llvm-svn: 278624
2016-08-14 09:34:39 +00:00
Igor Breger 8672408db0 [AVX512] Fix insertelement i1 lowering.
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.

Differential Revision: http://reviews.llvm.org/D23347

llvm-svn: 278623
2016-08-14 05:25:07 +00:00
Sanjay Patel 08c876673e [x86] add tests to show missed 64-bit immediate merging
Tests are slightly modified versions of those written by
Sunita Marathe in D23391.

llvm-svn: 278599
2016-08-13 18:42:14 +00:00
Craig Topper 3f8126e6fa [AVX-512] Remove an AddedComplexity that was prioritizing basic vzmovl patterns over more complex ones that produce better code.
llvm-svn: 278593
2016-08-13 05:43:20 +00:00
Craig Topper 600685d510 [AVX-512] Add patterns to support VZEXT_MOVL from 512-bit vectors with 64-bit and 32-bit elements.
Fixes PR28961.

llvm-svn: 278592
2016-08-13 05:33:12 +00:00
Haicheng Wu 7c4535d1e7 Reapply [BranchFolding] Restrict tail merging loop blocks after MBP
Fixed a bug in the test case.

To fix PR28104, this patch restricts tail merging to blocks that belong to the
same loop after MBP.

llvm-svn: 278575
2016-08-12 23:13:38 +00:00
Artur Pilipenko 87e4038a91 [x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext)
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)

In this change I extend this code to handle zexts as well.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D23359

llvm-svn: 278520
2016-08-12 16:08:30 +00:00
Simon Pilgrim 687d71e877 [X86][SSE] Add support for combining target shuffles to PSLLDQ/PSRLDQ byte shifts
llvm-svn: 278502
2016-08-12 11:24:34 +00:00
Simon Pilgrim ed96b9adfb [X86][SSE] Fixed PALIGNR target shuffle decode
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.

The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.

llvm-svn: 278494
2016-08-12 10:10:51 +00:00
Haicheng Wu d9cbb1608f Revert "[BranchFolding] Restrict tail merging loop blocks after MBP"
This reverts commit r278463 because it hits the bot.

llvm-svn: 278484
2016-08-12 08:40:24 +00:00
Wei Mi 7e103d92cc Recommit 'Remove the restriction that MachineSinking is now stopped by
"insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
adjusting some unittest checks.

This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.

We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.

Differential Revision: https://reviews.llvm.org/D23210

llvm-svn: 278466
2016-08-12 03:33:22 +00:00
Haicheng Wu ea02372059 [BranchFolding] Restrict tail merging loop blocks after MBP
To fix PR28014, this patch restricts tail merging to blocks that belong to the
same loop after MBP.

Differential Revision: https://reviews.llvm.org/D23191

llvm-svn: 278463
2016-08-12 03:30:23 +00:00
Vyacheslav Klochkov 6daefcf626 X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes.
This helped to improved memory-folding and register coalescing optimizations.

Also, this patch fixed the tracker #17229.

Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108

llvm-svn: 278431
2016-08-11 22:07:33 +00:00
Tim Northover da6f5f2d0a Remove empty file left by partial reversion.
llvm-svn: 278411
2016-08-11 21:01:15 +00:00
Wei Mi 3ab5816000 Revert rL278384 which caused several buildbot failures (like check failures in CodeGen/X86/clz.ll).
llvm-svn: 278402
2016-08-11 20:33:37 +00:00
Wei Mi ec19b35179 Remove the restriction that MachineSinking is now stopped by "insert_subreg,
subreg_to_reg, and reg_sequence" instructions.

This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.

We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.

Differential Revision: https://reviews.llvm.org/D23210

llvm-svn: 278384
2016-08-11 18:42:56 +00:00
Michael Kuperstein e36d7716c3 Make TwoAddressInstructionPass::rescheduleMIBelowKill subreg-aware
This fixes PR28824.

Differential Revision: https://reviews.llvm.org/D23220

llvm-svn: 278370
2016-08-11 17:38:33 +00:00
Igor Breger a77b14d02c [AVX512] Fix extractelement i1 lowering.
The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing.
Make extractelement i1 lowering custom for all vector i1.

Differential Revision: http://reviews.llvm.org/D23246

llvm-svn: 278328
2016-08-11 12:13:46 +00:00
Marina Yatsina 88f0c31f13 Avoid false dependencies of undef machine operands
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.

Pseudo example:

loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop

In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.

Differential Revision: https://reviews.llvm.org/D22466

llvm-svn: 278321
2016-08-11 07:32:08 +00:00
Craig Topper a78b768ed4 [AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency.
llvm-svn: 278318
2016-08-11 06:04:07 +00:00
Craig Topper 14aa2665d3 [AVX-512] Add patterns to allow EVEX encoded stores of v16i16/v8i16/v16i8/v32i8 even when BWI is not supported.
llvm-svn: 278317
2016-08-11 06:04:04 +00:00
Craig Topper 3563d0f622 [AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead.
llvm-svn: 278316
2016-08-11 06:04:00 +00:00
Sanjay Patel 5ccc85fe83 [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)
This handles the case in:
https://llvm.org/bugs/show_bug.cgi?id=28895

...but we are not getting all of the possibilities yet. 
Eg, we use 'X86::FANDN' for scalar FP select combines.

That enhancement is filed as:
https://llvm.org/bugs/show_bug.cgi?id=28925

Differential Revision: https://reviews.llvm.org/D23337

llvm-svn: 278270
2016-08-10 19:00:11 +00:00
Simon Pilgrim b204f03004 [X86][XOP] Tweak vpermil2pd test to stop it being combined away
The target shuffle combined to a BLENDPD pattern which we will shortly add support for.

llvm-svn: 278233
2016-08-10 15:15:56 +00:00
Simon Pilgrim f1f55198c1 [X86][SSE] Regenerate vector shift lowering tests
llvm-svn: 278232
2016-08-10 15:13:49 +00:00
Sanjay Patel 2c677a9306 use different comparison predicates for better test coverage
llvm-svn: 278229
2016-08-10 15:06:11 +00:00
Simon Pilgrim ac8fa6c2c6 [X86][SSE] Add support for combining target shuffles to MOVSS/MOVSD
Only do this on pre-SSE41 targets where we should be lowering to BLENDPS/BLENDPD instead

llvm-svn: 278228
2016-08-10 14:15:41 +00:00
Simon Pilgrim d99242c44d [X86][SSE] Regenerate SSE1 tests
Properly demonstrate the nasty codegen we get for vselect without integer vectors

llvm-svn: 278215
2016-08-10 12:26:40 +00:00
Simon Pilgrim cb5a189b90 Regenerate test
llvm-svn: 278214
2016-08-10 12:24:19 +00:00
Simon Pilgrim 85c7ea86ae [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.

i.e. 
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )

Differential Revision: https://reviews.llvm.org/D23330

llvm-svn: 278211
2016-08-10 10:50:53 +00:00
Sanjay Patel d34b128fbc add test cases for missed vselect optimizations (PR28895)
llvm-svn: 278165
2016-08-09 21:07:17 +00:00
Sanjay Patel b61346b8b0 regenerate checks and remove 'opt' run dependency
llvm-svn: 278154
2016-08-09 20:09:16 +00:00
David Majnemer adc688ce9c [X86] Don't model UD2/UD2B as a terminator
A UD2 might make its way into the program via a call to @llvm.trap.
Obviously, calls are not terminators.  However, we modeled the X86
instruction, UD2, as a terminator.  Later on, this confuses the epilogue
insertion machinery which results in the epilogue getting inserted
before the UD2.  For some platforms, like x64, the result is a
violation of the ABI.

Instead, model UD2/UD2B as a side effecting instruction which may
observe memory.

llvm-svn: 278144
2016-08-09 17:55:12 +00:00
Simon Pilgrim 76964e3140 [DAGCombiner] Better support for shifting large value type by constants
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.

This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.

Differential Revision: https://reviews.llvm.org/D23007

llvm-svn: 278141
2016-08-09 17:39:11 +00:00
Simon Pilgrim 27740d038c [X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS
llvm-svn: 278120
2016-08-09 12:56:15 +00:00
Elena Demikhovsky 0e0e07f436 AVX-512: A new test for FMA intrinsic
A new test that explores sub-optimal sequence of FMA intrinsic and FNEG operation.
An upcoming patch will fix it.

llvm-svn: 278117
2016-08-09 11:54:14 +00:00
Simon Pilgrim aae7d4a1b6 [X86][XOP] Add support for combining target shuffles to VPPERM
llvm-svn: 278114
2016-08-09 10:56:29 +00:00
Dean Michael Berris 3a25d84a51 [XRay] Test for xray_instr_map in object file. (NFC)
This makes a trivial change in the emission of the per-function XRay
tables, and makes sure that the xray_instr_map section does show up in
the object file.

llvm-svn: 278113
2016-08-09 10:42:11 +00:00
Simon Pilgrim 54c32ddf55 [X86][SSE] Fix memory folding of (v)roundsd / (v)roundss
We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier.

This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481.

Differential Revision: https://reviews.llvm.org/D23276

llvm-svn: 278106
2016-08-09 09:32:34 +00:00
Craig Topper 92a4ff1294 [AVX-512] Add support for execution domain switching masked logical ops between floating point and integer domain.
This switches PS<->D and PD<->Q.

llvm-svn: 278097
2016-08-09 05:26:07 +00:00
Craig Topper 9bd6241106 [X86] Remove the Fv packed logical operation alias instructions. Replace them with patterns to the regular instructions.
This enables execution domain fixing which is why the tests changed.

llvm-svn: 278090
2016-08-09 03:06:33 +00:00
Craig Topper de06b51d3d [X86] Remove unnecessary bitcast from the front of AVX1Only 256-bit logical operation patterns.
llvm-svn: 278088
2016-08-09 03:06:26 +00:00
Matthias Braun 7313ca6dbf X86InstrInfo: Update liveness in classifyLea()
We need to update liveness information when we create COPYs in
classifyLea().

This fixes http://llvm.org/28301

llvm-svn: 278086
2016-08-09 01:47:26 +00:00
Charles Davis e9c32c7ed3 Revert "[X86] Support the "ms-hotpatch" attribute."
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.

llvm-svn: 278053
2016-08-08 21:20:15 +00:00
Charles Davis 0822aa118e [X86] Support the "ms-hotpatch" attribute.
Summary:
Based on two patches by Michael Mueller.

This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.

This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.

Reviewers: majnemer, sanjoy, rnk

Subscribers: dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D19908

llvm-svn: 278048
2016-08-08 21:01:39 +00:00
Craig Topper f44423120f [AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd.
llvm-svn: 277965
2016-08-07 21:52:59 +00:00
Craig Topper 2c51c74d52 [AVX-512] Add 512-bit logical operations to load folding tables. Add avx512f stack folding test and move some tests from the avx512vl test.
llvm-svn: 277961
2016-08-07 17:14:09 +00:00
Craig Topper 938e7ab9e1 [AVX-512] Add EVEX encoded floating point MAX/MIN instructions to the load folding tables.
llvm-svn: 277960
2016-08-07 17:14:05 +00:00
Elena Demikhovsky dca03bebd3 AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".

Differential Revision: https://reviews.llvm.org/D23247

llvm-svn: 277958
2016-08-07 13:05:58 +00:00
Simon Pilgrim 69f2299efc [X86][AVX512BW] Add sext/zext AVX512BW 512-bit vector tests
llvm-svn: 277957
2016-08-07 12:41:36 +00:00
Simon Pilgrim a23141eca7 [X86][AVX512] Add sext/zext to 512-bit vector tests
llvm-svn: 277956
2016-08-07 12:10:46 +00:00
Elena Demikhovsky 2fabdcc60a AVX-512: Added a test for cmp intrinsics
This is a new test that should explore a current suboptimal sequence in passing values between cmp and kor intrinsics.
The code will be optimized in an upcoming patch.

Submitted bug here:
https://llvm.org/bugs/show_bug.cgi?id=28839

llvm-svn: 277954
2016-08-07 09:29:34 +00:00
Craig Topper 49841c3812 [X86] Add commutable floating point max/min instructions to the load folding tables.
llvm-svn: 277949
2016-08-07 05:39:51 +00:00
Craig Topper 2c1f6706de [AVX-512] Add andnps/andnpd to the avx512vl stack folding test.
llvm-svn: 277948
2016-08-07 05:39:48 +00:00
Simon Pilgrim bc573ca1b8 [X86][AVX2] Improve sign/zero extension on AVX2 targets
Split extensions to large vectors into 256-bit chunks - the equivalent of what we do with pre-AVX2 into 128-bit chunks

llvm-svn: 277939
2016-08-06 21:21:12 +00:00
Craig Topper 19505bc354 [AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate.
llvm-svn: 277933
2016-08-06 19:31:50 +00:00
Craig Topper b0476fcc1f [AVX-512] Add AVX512 run line to a test and re-generate the checks. Future commits will refine some of the sequences.
llvm-svn: 277932
2016-08-06 19:31:47 +00:00
Simon Pilgrim 7d168e19e8 [X86][SSE] Enable commutation between MOVHLPS and UNPCKHPD
Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities.

VEX encoded versions don't benefit so I haven't added support to them.

llvm-svn: 277930
2016-08-06 18:40:28 +00:00
Simon Pilgrim ef10e922d8 [X86][SSE] Regenerate SSE1 shuffle tests
llvm-svn: 277925
2016-08-06 13:46:09 +00:00
Simon Pilgrim 69b6a70834 [X86][SSE] Add initial support for 2 input target shuffle combining.
At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place.

llvm-svn: 277839
2016-08-05 17:36:14 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Simon Pilgrim 8ae6dad49b [X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for
Improved CTTZ/CTLZ costings will be added shortly

llvm-svn: 277713
2016-08-04 10:14:39 +00:00
Dean Michael Berris 7e9abea2ae [XRay] Align entry and return sleds to 2 byte boundaries
This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.

We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.

Update the tests to allow us to reflect the new order of assembly.

Reviewers: rSerge, echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23101

llvm-svn: 277701
2016-08-04 07:37:28 +00:00
Simon Pilgrim 898f030f70 [X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs.
We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero).

This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask.

We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering.

This will allow us to combine to 2-input shuffles in a future patch.

Differential Revision: https://reviews.llvm.org/D22859

llvm-svn: 277631
2016-08-03 19:08:24 +00:00
Dean Michael Berris 0b8f6c8777 [XRay] Make the xray_instr_map section specification more correct
Summary:
We also add a test to show what currently happens when we create a
section per function and emit an xray_instr_map. This illustrates the
relationship (or lack thereof) between the per-function section and the
xray_instr_map section.

We also change the code generation slightly so that we don't always
create group sections, but rather only do so if a function where the
table is associated with is in a group.

Also in this change:

  - Remove the "merge" flag on the xray_instr_map section.
  - Test that we're generating the right table for comdat and non-comdat functions.

Reviewers: echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23104

llvm-svn: 277580
2016-08-03 07:21:55 +00:00
Craig Topper 05948fb36c [AVX-512] Correct ExeDomain for many AVX-512 instructions.
llvm-svn: 277416
2016-08-02 05:11:15 +00:00
Michael Kuperstein c97da7f3a4 [DAGCombine] Make sext(setcc) combine respect getBooleanContents
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.

This fixes PR28504.

llvm-svn: 277371
2016-08-01 19:39:49 +00:00
Simon Pilgrim 46f119a59f [X86] Use implicit masking of SHLD/SHRD shift double instructions
Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value

llvm-svn: 277341
2016-08-01 12:11:43 +00:00
Craig Topper c48c029610 [AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types.
llvm-svn: 277327
2016-08-01 07:55:33 +00:00
Craig Topper ddc96cd33d [X86] Regenerate a test to pick up shuffle comments that were added at some point.
llvm-svn: 277326
2016-08-01 07:55:24 +00:00
Craig Topper 749a111f1e [AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported.
Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling.

llvm-svn: 277321
2016-08-01 05:31:50 +00:00
Craig Topper 9161e4ec22 [AVX512] Replace scalar fp arithmetic intrinsics with native IR in an AVX512 test. The intrinsics aren't lowered to AVX512 instructions.
The intrinsics really should be removed and autoupgraded.

llvm-svn: 277320
2016-08-01 04:29:16 +00:00
Simon Pilgrim 8ae4354df6 [X86][SSE] Regenerate frem tests
llvm-svn: 277311
2016-07-31 21:59:23 +00:00
Simon Pilgrim b089ba4c65 [X86][SSE] Regenerate fpext tests
llvm-svn: 277310
2016-07-31 21:55:33 +00:00
Craig Topper 7afdc0fb25 [AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported.
llvm-svn: 277305
2016-07-31 20:20:05 +00:00
Craig Topper 4c53e60360 [AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests.
llvm-svn: 277304
2016-07-31 20:20:01 +00:00
Craig Topper 338ec9a0cb [AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned.
llvm-svn: 277302
2016-07-31 20:19:53 +00:00
Craig Topper 2a6bbb8203 [AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass.
llvm-svn: 277301
2016-07-31 20:19:50 +00:00
Simon Pilgrim 6be48e4aa7 [X86] Improve 64-bit shifts on 32-bit targets (PR14593)
As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.

Differential Revision: https://reviews.llvm.org/D23000

llvm-svn: 277299
2016-07-31 19:50:45 +00:00
Simon Pilgrim 64845bb8b4 [X86] Add tests for the lowering SHLD/SHRD from manual pattern similar to those generated by ExpandShiftWithKnownAmountBit
Test for add(v,v) as well as shl(v,1)

llvm-svn: 277293
2016-07-31 17:51:37 +00:00
Craig Topper 00d34ed64f [AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR.
Thanks to Igor Breger for pointing out my mistake.

llvm-svn: 277292
2016-07-31 17:15:07 +00:00
Simon Pilgrim 1e096b3a7a [X86] Add tests for the lowering SHLD/SHRD from manual patterns
As discussed on D23000

llvm-svn: 277291
2016-07-31 17:11:49 +00:00
Simon Pilgrim fbe0fcb009 [X86][SSE] Regenerate vshift tests
llvm-svn: 277278
2016-07-30 20:28:02 +00:00
Simon Pilgrim 2e9de1cebb [X86][AVX] Added signum example test functions from PR13248
These are good examples of missed combine opportunities with zero/all bit vector compare results
 

llvm-svn: 277274
2016-07-30 16:29:19 +00:00
Simon Pilgrim b9e1f9c5c5 [X86][X87] Add vector arithmetic tests for targets with sse disabled
To make sure the X86_64 target isn't doing anything stupid

llvm-svn: 277272
2016-07-30 16:01:30 +00:00
Simon Pilgrim cf49fa3251 [X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit
The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results

llvm-svn: 277271
2016-07-30 14:06:59 +00:00
Michael Kuperstein f396b4c40d [X86] Match PSADBW in straight-line code
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.

This adds support for straight-line patterns, like those generated by the
SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D22889

llvm-svn: 277219
2016-07-29 21:45:51 +00:00
Simon Pilgrim f107ffa8f0 [X86][AVX] Fix VBROADCASTF128 selection bug (PR28770)
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.

llvm-svn: 277214
2016-07-29 21:05:10 +00:00
Simon Pilgrim 455460f310 Fixed line endings
llvm-svn: 277199
2016-07-29 18:58:57 +00:00
Andrew Kaylor b99d1cc7ed Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)

llvm-svn: 277189
2016-07-29 18:23:18 +00:00
Kyle Butt 02d8d054ab Codegen: MachineBlockPlacement Improve probability layout.
The following pattern was being layed out poorly:

              A
             / \
            B   C
           / \ / \
          D   E   ? (Doesn't matter)

Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)

The current algorithm gives:
A,B,C,E (D goes on worklist)

It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:

A,B,E, (C and D are in a worklist)

llvm-svn: 277187
2016-07-29 18:09:28 +00:00
David L Kreitzer 8b959e5cfa Avoid unnecessary 32-bit to 64-bit zero extensions following
32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly
zero extends.

Differential Revision: https://reviews.llvm.org/D22941

llvm-svn: 277148
2016-07-29 15:09:54 +00:00
Simon Pilgrim cb780b32a3 [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.

Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.

We avoid performing this on AVX512 as it should have better alternative truncation instructions.

Differential Revision: https://reviews.llvm.org/D22814

llvm-svn: 277132
2016-07-29 10:23:10 +00:00
Craig Topper e4f868ea16 [AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable.
llvm-svn: 277120
2016-07-29 06:06:04 +00:00
Craig Topper 07aa37039e [AVX512] Add AVX512 run lines to some tests for scalar fma/add/sub/mul/div and regenerate. Follow up commits will bring AVX512 code up to the same quality as AVX/SSE.
llvm-svn: 277118
2016-07-29 06:05:58 +00:00
Craig Topper c7de3a1018 [AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX.
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.

llvm-svn: 277098
2016-07-29 02:49:08 +00:00