Commit Graph

12427 Commits

Author SHA1 Message Date
Craig Topper ec2ea4817e [X86] Replace getScalarType with getVectorElementType when the type is already known to be a vector. This should result in slightly less code. NFC
llvm-svn: 251751
2015-10-31 21:44:52 +00:00
Craig Topper 476be8f94a [X86] Convert to MVT instead of calling EVT functions since we already know the type is simple. NFC
llvm-svn: 251745
2015-10-31 18:14:17 +00:00
Craig Topper 0fec4d8ce7 [X86] Call getScalarSizeInBits() instead of getScalarType().getScalarSizeInBits(). NFC
llvm-svn: 251744
2015-10-31 18:14:15 +00:00
Craig Topper 0e7680da9f [X86] Remove two const references to the return value of a constructor and just use normal object creation syntax. NFC
llvm-svn: 251743
2015-10-31 17:28:02 +00:00
Craig Topper 7b1d3a8a6c [X86] Replace EVT with MVT in some more places. NFC
llvm-svn: 251742
2015-10-31 17:27:59 +00:00
Craig Topper 63c2925b87 [X86] Fix indentation of case statements in switch. NFC
llvm-svn: 251741
2015-10-31 17:27:56 +00:00
Craig Topper 5c8a378f48 [X86] Reduce math for index calculation for inserting and extracting subvectors and elements by exploiting the fact that all supported vector types have a power 2 number of elements.
llvm-svn: 251740
2015-10-31 17:27:52 +00:00
Craig Topper 9377f01f21 [X86] Use is128BitVector/is256BitVector/is512BitVector in place of getSizeInBits == in some places. NFC
llvm-svn: 251687
2015-10-30 04:31:18 +00:00
Craig Topper 62c3ed0ae3 [X86] Minor formatting fixes. NFC.
llvm-svn: 251686
2015-10-30 04:31:14 +00:00
Craig Topper 9ef327c962 [X86] Use MVT instead of EVT in some places. NFC
Prior to this the compiled code probably had extra checks for extended types that won't ever execute.

llvm-svn: 251682
2015-10-30 03:19:12 +00:00
Simon Pilgrim ca56a72af9 [X86][SSE] Shuffle blends with zero
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Differential Revision: http://reviews.llvm.org/D14050

llvm-svn: 251659
2015-10-29 22:11:28 +00:00
Benjamin Kramer 4e4ca38bcf Remove CRLF line endings.
llvm-svn: 251594
2015-10-29 02:33:05 +00:00
Cong Hou da4e8aeec6 [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is simple before calling getSimpleVT().
llvm-svn: 251538
2015-10-28 18:15:46 +00:00
Benjamin Kramer 039b10423a Put global classes into the appropriate namespace.
Most of the cases belong into an anonymous namespace. No
functionality change intended.

llvm-svn: 251515
2015-10-28 13:54:36 +00:00
Craig Topper 93d4a9e117 [X86] Make some for loops over MVTs more explicit (and shorter) by just mentioning all the relevant types in an initializer list. NFC
llvm-svn: 251500
2015-10-28 05:48:32 +00:00
Craig Topper 3a47587c41 Use range-based for loops and use initializer list to remove a small static array. NFC
llvm-svn: 251494
2015-10-28 04:53:27 +00:00
Craig Topper 4b27576001 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

llvm-svn: 251490
2015-10-28 04:02:12 +00:00
Asaf Badouh c7cb880669 [X86][AVX512] [X86][AVX512] add convert float to half
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).

Differential Revision: http://reviews.llvm.org/D14113

llvm-svn: 251409
2015-10-27 15:37:17 +00:00
Michael Kuperstein e1194bdb4f [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Craig Topper ee0c859788 Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

llvm-svn: 251382
2015-10-27 04:14:24 +00:00
Sanjay Patel 309c4f93e5 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Sanjay Patel e9b500f722 reorganize logic; NFCI (retry r251349)
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

..and I really hope it's NFC this time!

llvm-svn: 251357
2015-10-26 21:54:14 +00:00
Sanjay Patel f29fed423a revert r251349; it included code for a functional change
llvm-svn: 251350
2015-10-26 21:28:02 +00:00
Sanjay Patel fdf75452e4 reorganize logic; NFCI
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

llvm-svn: 251349
2015-10-26 21:24:09 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Igor Breger e4ddc3f4cd AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00
Igor Breger 684af8156c AVX-512: Use correct extract vector length.
Bug https://llvm.org/bugs/show_bug.cgi?id=25318

Differential Revision: http://reviews.llvm.org/D14062

llvm-svn: 251285
2015-10-26 12:26:34 +00:00
Igor Breger f8e461f920 AVX512: Add AVX-512 not materializable instructions.
Otherwise value can be reused , despite its value could be changed - produces incorrect assembler.

https://llvm.org/bugs/show_bug.cgi?id=25270

Differential Revision: http://reviews.llvm.org/D14057

llvm-svn: 251275
2015-10-26 08:37:12 +00:00
David Majnemer a375b26144 [MC] Don't crash when .word is given bogus values
We didn't validate that the .word directive was given a sane value,
leading to crashes when we attempt to write out the object file.

Instead, perform some validation and issue a diagnostic pointing at the
start of the diagnostic.

llvm-svn: 251270
2015-10-26 02:45:50 +00:00
Benjamin Kramer 8ceb323bb4 Convert assert(false) into llvm_unreachable where it makes sense.
llvm-svn: 251266
2015-10-25 22:28:27 +00:00
Simon Pilgrim ec6db262e0 [X86][SSE4A] Fix for EXTRQI shuffle lowering.
Incorrect range test - found during fuzz testing.

llvm-svn: 251245
2015-10-25 17:40:54 +00:00
Elena Demikhovsky 092858588a Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Michael Kuperstein eaa16005af [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Michael Kuperstein fe897623f3 [X86] Add support for elfiamcu triple
This adds support for the i?86-*-elfiamcu triple, which indicates the IAMCU psABI is used.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251222
2015-10-25 08:07:37 +00:00
Craig Topper eda02a905e Remove two unnecessary conversions from MVT to EVT. NFC
llvm-svn: 251219
2015-10-25 03:15:29 +00:00
Simon Pilgrim 53c2bff5fe [X86][SSE] Use lowerVectorShuffleWithUNPCK instead of custom matches.
Most 128-bit and 256-bit shuffles were manually matching UNPCK patterns - use lowerVectorShuffleWithUNPCK to be more thorough.

llvm-svn: 251211
2015-10-24 22:45:04 +00:00
Simon Pilgrim fdfed5143c [X86][SSE] lowerVectorShuffleWithUNPCK - use equivalent shuffle mask test.
Use isShuffleEquivalent to match UNPCK shuffles - better support for build vector inputs.

llvm-svn: 251207
2015-10-24 20:48:08 +00:00
Hans Wennborg 34d40434a7 X86ISelLowering: Support tail calls to/from callee pop functions
This enables tail calls with thiscall, stdcall, vectorcall and
fastcall functions.

Differential Revision: http://reviews.llvm.org/D13999

llvm-svn: 251190
2015-10-24 16:47:10 +00:00
Simon Pilgrim e379fe0ddb Fix unused variable warning. NFC.
llvm-svn: 251189
2015-10-24 13:41:45 +00:00
Simon Pilgrim d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Reid Kleckner f02e33ce42 [X86] Clean up the tail call eligibility logic
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.

Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.

Reviewers: hans

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14024

llvm-svn: 251137
2015-10-23 19:35:38 +00:00
Joseph Tremoulet 3d0fbf1d74 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
Eric Christopher 227d71bba6 Remove the last traces of X86CompilationCallback as it is completely
unused.

llvm-svn: 251035
2015-10-22 17:55:35 +00:00
Asaf Badouh 7c52245660 [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions
Differential Revision: http://reviews.llvm.org/D13945

llvm-svn: 251018
2015-10-22 14:01:16 +00:00
Elena Demikhovsky 5c97dfdc9c AVX-512: Fixed a bug in select_cc for i1 type
Fixed faiure:
LLVM ERROR: Cannot select: t33: i1 = select_cc t25, Constant:i32<0>, t45, t42, seteq:ch

added a test

Differential Revision: http://reviews.llvm.org/D13943

llvm-svn: 250996
2015-10-22 07:10:29 +00:00
Elena Demikhovsky 7ad0d563a5 Partially reverted changes from r250686
Clang runtime failure was reported.
   Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT
I'll need to add a proper handling for PointerType in masked load/store intrinsics.

llvm-svn: 250995
2015-10-22 06:20:29 +00:00
Sanjay Patel efab8b0d08 [x86] move recursive add match for LEA to helper function; NFCI
llvm-svn: 250926
2015-10-21 18:56:06 +00:00
Craig Topper 896c267544 [X86] Add AMD mwaitx, monitorx, and clzero instructions to the assembly parser and disassembler.
llvm-svn: 250911
2015-10-21 17:26:45 +00:00
Mehdi Amini 4215236621 Do not use `dyn_cast<X>` after `isa<X>` (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 250883
2015-10-21 06:11:01 +00:00
Igor Breger 21296d230a AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions.
Differential Revision: http://reviews.llvm.org/D13884

llvm-svn: 250819
2015-10-20 11:56:42 +00:00
Duncan P. N. Exon Smith d77de6495e X86: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250741
2015-10-19 21:48:29 +00:00
Benjamin Kramer 335332329b Remove CRLF newlines. NFC.
llvm-svn: 250698
2015-10-19 13:05:25 +00:00
Elena Demikhovsky 20662e39f1 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850

llvm-svn: 250686
2015-10-19 07:43:38 +00:00
Asaf Badouh 696e8e0bb7 [X86][AVX512DQ] add scalar fpclass
Differential Revision: http://reviews.llvm.org/D13769

llvm-svn: 250650
2015-10-18 11:04:38 +00:00
Igor Breger cbb9550537 AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction
Differential Revision: http://reviews.llvm.org/D13632

llvm-svn: 250649
2015-10-18 09:56:39 +00:00
Simon Pilgrim 86c5e85e84 [X86][XOP] Add VPROT instruction opcodes
Added X86ISD opcodes for VPROT vector rotate by variable and by immediate.

llvm-svn: 250620
2015-10-17 19:04:24 +00:00
Craig Topper 9ff9bf4959 Replace a custom table sort check with std::is_sorted. Change a function to take ArrayRef instead of pointer and length. NFC
llvm-svn: 250615
2015-10-17 16:37:13 +00:00
Simon Pilgrim a18ae9bd70 [CostModel] Fixed AVX integer shift costs
Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts.

llvm-svn: 250611
2015-10-17 13:23:38 +00:00
Simon Pilgrim 5b65f28fe7 [X86][FastISel] Teach how to select SSE4A nontemporal stores.
Add FastISel support for SSE4A scalar float / double non-temporal stores

Follow up to D13698

Differential Revision: http://reviews.llvm.org/D13773

llvm-svn: 250610
2015-10-17 13:04:42 +00:00
Reid Kleckner 28e490342b [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Sanjay Patel bbd524496c [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Andrew Kaylor 09b39acc03 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
JF Bastien 2cdd5e4710 x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
JF Bastien 5b327712b0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Benjamin Kramer 5dfcda73d5 [X86] Rip out orphaned method declarations and other dead code. NFC.
llvm-svn: 250406
2015-10-15 14:09:59 +00:00
Igor Breger d7bae451de AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
Igor Breger b4bb190eed AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky ecff21b297 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Craig Topper fd2cc7cd8a Add XSAVE/XSAVEOPT to KNL processor.
llvm-svn: 250362
2015-10-15 03:56:54 +00:00
Andrea Di Biagio c47edbef4c [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
Craig Topper 0ee356951a [X86] Add XSAVE feature flags to their various processors.
llvm-svn: 250268
2015-10-14 05:37:38 +00:00
Sanjay Patel 85030aa1bd function names should start with a lower case letter; NFC
llvm-svn: 250174
2015-10-13 16:23:00 +00:00
Sanjay Patel b5723d0dbd don't repeat function/class/variable names in comments; NFC
llvm-svn: 250162
2015-10-13 15:12:27 +00:00
Michael Kuperstein af22dafc8b Fix line-ending issue. NFC.
llvm-svn: 250151
2015-10-13 06:22:30 +00:00
Craig Topper 24b56a62bb [X86] Mark the AAD and AAM aliases as not valid in 64-bit mode.
llvm-svn: 250148
2015-10-13 05:12:07 +00:00
Craig Topper 4f76372afc [X86] Change all the i8imm operands in XOP instructions to u8imm so the parser will check the size.
llvm-svn: 250147
2015-10-13 05:06:25 +00:00
JF Bastien 986ed68eed x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Reid Kleckner 4a5f35c0ae Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Andrea Di Biagio b0fe4eb199 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Sanjay Patel 0dc91b3143 combine predicates; NFCI
llvm-svn: 250075
2015-10-12 18:15:08 +00:00
Sanjay Patel b814ef1ad6 fix typos; NFC
llvm-svn: 250059
2015-10-12 16:09:59 +00:00
Sanjay Patel 53d1d8b731 fix capitalization; NFC
llvm-svn: 250049
2015-10-12 15:24:01 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio a0922ed8fe [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Craig Topper 8d2e6bc25b [X86] Use u8imm for the immediate type for all shift and rotate instructions. This way the assembler will perform range checking. Believe this matches gas behavior.
llvm-svn: 250016
2015-10-12 06:23:10 +00:00
Craig Topper d6b661dbf0 [X86] Add support to assembler and MCInst lowering to use the other vmovq %xmmX, %xmmX encoding if it would be a shorter VEX encoding.
llvm-svn: 250014
2015-10-12 04:57:59 +00:00
Craig Topper 635e05df0a [X86] Cleanup formatting a bit. NFC
llvm-svn: 250013
2015-10-12 04:27:17 +00:00
Craig Topper 5be914eda1 [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size.
llvm-svn: 250012
2015-10-12 04:17:55 +00:00
Craig Topper 95fffba227 [X86] Add some instruction aliases to get the assembly parser table to favor arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax.
This allows us to remove the explicit code for working around the existing priority

llvm-svn: 250011
2015-10-12 03:39:57 +00:00
Craig Topper fcc34bdee0 [X86] Fix CMP and TEST with al/ax/eax/rax to not mark EFLAGS as a use or al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel.
llvm-svn: 249994
2015-10-11 19:54:02 +00:00
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Craig Topper a71630729d [X86] Simplify immediate range checking code.
llvm-svn: 249979
2015-10-11 16:38:14 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Craig Topper 7143d8001a Use range-based for loops. NFC.
llvm-svn: 249941
2015-10-10 05:25:06 +00:00
Craig Topper 7d5b23101c Use emplace_back instead of a constructor call and push_back. NFC
llvm-svn: 249940
2015-10-10 05:25:02 +00:00
David Majnemer bfa5b98201 [WinEH] Remove more dead code
wineh-parent is dead, so is ValueOrMBB.

llvm-svn: 249920
2015-10-10 00:04:29 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight 5b8217bc05 Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Evgeniy Stepanov 5fe279e727 Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Reid Kleckner b2244cb8f0 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Frederic Riss 263b772bda [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger defab3c1ef AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
Michael Kuperstein 04e79329d0 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Reid Kleckner 97797419e6 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
Kevin B. Smith 9c7408807f Test commit access. Fixed comment to have correct input parameter name and
period termination.

llvm-svn: 249571
2015-10-07 17:24:25 +00:00
Michael Kuperstein 259f1508f0 [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Igor Breger 1a6fd1cc0f AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG as W0 and not W1, like all other instruction have been implemented.
Add encoding tests.

Differential Revision: http://reviews.llvm.org/D13471

llvm-svn: 249521
2015-10-07 06:31:18 +00:00
Hans Wennborg 083ca9bb32 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Joseph Tremoulet 2afea5438f [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Craig Topper d69d495333 [X86] Remove unnecessary AddComplexity directive. The instruction is already wrapped in the equivalent earlier. NFC
llvm-svn: 249369
2015-10-06 02:50:21 +00:00
David Majnemer e4f9b09b51 [WinEH] Update CATCHRET's operand to match its successor
The CATCHRET operand did not match the MachineFunction's CFG.  This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.

Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.

llvm-svn: 249344
2015-10-05 20:09:16 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
Simon Pilgrim bc707d04a4 [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.

llvm-svn: 249243
2015-10-03 18:55:43 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
David Blaikie 757908e545 Fix -Wsign-compare warning
llvm-svn: 248942
2015-09-30 20:37:48 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
NAKAMURA Takumi 0c12a3949e [CMake] X86AsmParser: Prune redundant LINK_LIBS.
It is described in LLVMBuild.txt.

llvm-svn: 248771
2015-09-29 01:25:01 +00:00
Sanjay Patel 3a14f1a338 add a FIXME for a CPU model check that should have an attribute instead
llvm-svn: 248746
2015-09-28 22:00:24 +00:00
Andrew Kaylor 16c4da03d5 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Yaron Keren e5a9dc2f5b Silence clang warning: variable ‘Status’ set but not used.
llvm-svn: 248691
2015-09-27 21:31:33 +00:00
Simon Pilgrim 68d0050c6a [X86][SSE2] Fix zero/any extension shuffles that don't start from the first element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

llvm-svn: 248536
2015-09-24 21:02:17 +00:00
Sanjay Patel 1a6534661b [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0

into this:

xorps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248415
2015-09-23 18:33:42 +00:00
Sanjay Patel aba37553c4 [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0

into this:

orps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248409
2015-09-23 18:19:07 +00:00
Evgeniy Stepanov a2002b08f7 Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

This is a re-commit of a change in r248357 that was reverted in
r248358.

llvm-svn: 248405
2015-09-23 18:07:56 +00:00
Sanjay Patel b14ecd34f7 move call to convertIntLogicToFPLogic up; NFCI
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.

llvm-svn: 248404
2015-09-23 18:03:37 +00:00
Sanjay Patel ade3abd2d9 [x86] move code for converting int logic to FP logic to a helper function; NFCI
This is a follow-on to:
http://reviews.llvm.org/rL248395

so we can add the call to the or/xor combines too.

llvm-svn: 248399
2015-09-23 17:39:41 +00:00
Sanjay Patel df2495f331 [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars
Turn this:
   movd %xmm0, %eax
   movd %xmm1, %ecx
   andl %eax, %ecx
   movd %ecx, %xmm0

into this:
   andps %xmm1, %xmm0


This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

Differential Revision: http://reviews.llvm.org/D13065

llvm-svn: 248395
2015-09-23 17:00:06 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Evgeniy Stepanov 8d0e3011d8 Revert "Android support for SafeStack."
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.

llvm-svn: 248358
2015-09-23 01:23:22 +00:00
Evgeniy Stepanov ce2e16f00c Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

llvm-svn: 248357
2015-09-23 01:03:51 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 84965031a7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
Simon Pilgrim 1cad0cd3ce [X86][SSE] Match zero/any extension shuffles that don't start from the first element
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.

The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.

Differential Revision: http://reviews.llvm.org/D12561

llvm-svn: 248250
2015-09-22 08:16:08 +00:00
Chad Rosier 03a47305ec [Machine Combiner] Refactor machine reassociation code to be target-independent.
No functional change intended.
Patch by Haicheng Wu <haicheng@codeaurora.org>!

http://reviews.llvm.org/D12887
PR24522

llvm-svn: 248164
2015-09-21 15:09:11 +00:00
Asaf Badouh eaf2da14bf [X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP
Differential Revision: http://reviews.llvm.org/D12524

llvm-svn: 248147
2015-09-21 10:23:53 +00:00
Igor Breger b7e1f9d680 AVX512: Implemented encoding and intrinsics for vcmpss/sd.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12593

llvm-svn: 248121
2015-09-20 15:15:10 +00:00
Asaf Badouh 2744d21fb8 [X86][AVX512] extend support in Scalar conversion
add scalar FP to Int conversion with truncation intrinsics
add scalar conversion FP32 from/to FP64 intrinsics
add rounding mode and SAE mode encoding for these intrinsics

Differential Revision: http://reviews.llvm.org/D12665

llvm-svn: 248117
2015-09-20 14:31:19 +00:00
Igor Breger 4c4cd789c9 AVX512: vsqrtss/sd encoding and intrinsics implementation.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12102

llvm-svn: 248116
2015-09-20 09:13:41 +00:00
Asaf Badouh 572bbceecc [X86][AVX512DQ] Add fpclass instruction
Differential Revision: http://reviews.llvm.org/D12931

llvm-svn: 248115
2015-09-20 08:46:07 +00:00
Michael Kuperstein 58e86bc893 [X86] Fix sitofp and uitofp instruction matching failures with long double and avx512
The operation action for i32 and i64 cannot be set to legal, as long double 
needs custom lowering.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D12372

llvm-svn: 248114
2015-09-20 08:12:17 +00:00
Igor Breger 1d55f20bee AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4
Added tests for intrinsics.

Differential Revision: http://reviews.llvm.org/D12525

llvm-svn: 248113
2015-09-20 07:18:53 +00:00
Igor Breger 0ede3cbb5c AVX512: Implement instructions encoding, lowering and intrinsics
vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4
Added tests for encoding, lowering and intrinsics.

Differential Revision: http://reviews.llvm.org/D11893

llvm-svn: 248111
2015-09-20 06:52:42 +00:00
Simon Pilgrim d0448ee59f [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))

Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))

Differential Revision: http://reviews.llvm.org/D12663

llvm-svn: 248091
2015-09-19 13:22:57 +00:00
Reid Kleckner 5b8a46e771 [WinEH] Make funclet return instrs pseudo instrs
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.

llvm-svn: 247936
2015-09-17 20:43:47 +00:00
Elena Demikhovsky 702a6adfaa AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1>
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way:

mask-2-simd , shuffle simd , simd-2-mask

Differential Revision: http://reviews.llvm.org/D12727

llvm-svn: 247876
2015-09-17 06:53:12 +00:00
Eric Christopher a4e5d3cf8e constify the Function parameter to the TTI creation callback and
propagate to all callers/users/etc.

llvm-svn: 247864
2015-09-16 23:38:13 +00:00
Reid Kleckner 813f1b65bc [WinEH] Rip out the landingpad-based C++ EH state numbering code
It never really worked, and the new code is working better every day.

llvm-svn: 247860
2015-09-16 22:14:46 +00:00
Reid Kleckner b005d281c3 [WinEH] Pull Adjectives and CatchObj out of the catchpad arg list
Clang now passes the adjectives as an argument to catchpad.

Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.

llvm-svn: 247844
2015-09-16 20:16:27 +00:00
Reid Kleckner 84ebff4a5e [WinEH] Skip state numbering when no EH pads are present
Otherwise we'd try to emit the thunk that passes the LSDA to
__CxxFrameHandler3. We don't emit the LSDA if there were no landingpads,
so we'd end up with an assembler error when trying to write the COFF
object.

llvm-svn: 247820
2015-09-16 17:19:44 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Michael Kuperstein d926465342 [X86] Do not generate 64-bit pops of 32-bit GPRs.
When trying emit a stack adjustments using pops, frame lowering selects an
arbitrary free GPR. It should always select one from an appropriate class...
This fixes PR24649.

Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12609

llvm-svn: 247785
2015-09-16 11:27:20 +00:00
Michael Kuperstein 098cd9fba7 [X86] Fix emitEpilogue() to make less assumptions about pops
This is the mirror image of r242395.
When X86FrameLowering::emitEpilogue() looks for where to insert the %esp addition that
deallocates stack space used for local allocations, it assumes that any sequence of pop
instructions from function exit backwards consists purely of restoring callee-save registers.

This may be false, since from some point backward, the pops may be clean-up of stack space
allocated for arguments to a call.

Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12688

llvm-svn: 247784
2015-09-16 11:18:25 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Daniel Sanders c8cd6e95d2 Fix namespace indentation and missing blank lines before 'public:' in *MCAsmInfo.h. NFC.
This is to reduce noise in a following commit.

Also fixes a couple missing spaces before the reference operator.

llvm-svn: 247679
2015-09-15 12:27:06 +00:00
Simon Pilgrim f8f86ab176 [X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles.
Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles.
Added shuffle decodes for 3DNow! PSWAPD shuffles.

llvm-svn: 247526
2015-09-13 11:28:45 +00:00
Elena Demikhovsky 8671fcbbd6 AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL.
KNL does not have VXORPS, VORPS for 512-bit values.
I use integer VPXOR, VPOR that actually do the same.

X86ISD::FXOR/FOR are generated as a result of FSUB combining.

Differential Revision: http://reviews.llvm.org/D12753

llvm-svn: 247523
2015-09-13 08:15:15 +00:00
Sanjay Patel 8b960d22ad [x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try)
The changes in:
test/CodeGen/X86/machine-cp.ll
are just due to scheduling differences after some logic instructions were reassociated.

llvm-svn: 247516
2015-09-12 19:47:50 +00:00
Simon Pilgrim 5253b7b4a7 [X86] Renamed lowerVectorShuffleAsUnpack NFCI.
Renamed to lowerVectorShuffleAsPermuteAndUnpack to make it clear that it lowers to more than just a UNPCK instruction.

llvm-svn: 247513
2015-09-12 18:26:47 +00:00
Simon Pilgrim 2fcfef542a [X86] Moved lowerVectorShuffleWithUNPCK earlier to make reuse easier. NFCI.
llvm-svn: 247511
2015-09-12 16:03:06 +00:00
Sanjay Patel 99f7370a79 revert r247506; need to verify changes in existing tests
llvm-svn: 247507
2015-09-12 15:27:31 +00:00
Sanjay Patel 08755c7dbc [x86] enable machine combiner reassociations for 128-bit vector logical integer insts
llvm-svn: 247506
2015-09-12 14:58:04 +00:00
Akira Hatanaka bc497c93f5 Use function attribute "stackrealign" to decide whether stack
realignment should be forced.

With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.

Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.

Differential Revision: http://reviews.llvm.org/D11814

llvm-svn: 247450
2015-09-11 18:54:38 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Reid Kleckner da6dcc5d92 [WinEH] Push and pop EBP for 32-bit funclets
The Win32 EH runtime caller does not preserve EBP, even though it does
preserve the CSRs (EBX, ESI, EDI) for us. The result was that each
finally funclet call would leave the frame pointer off by 12 bytes.

llvm-svn: 247348
2015-09-10 22:00:02 +00:00
Hans Wennborg aa15bffa1f Re-commit r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes"
Except the changes that defined virtual destructors as =default, because that
ran into problems with GCC 4.7 and overriding methods that weren't noexcept.

llvm-svn: 247298
2015-09-10 16:49:58 +00:00
Igor Breger 7f69a99c54 AVX512: Implemented encoding and intrinsics for
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11802

llvm-svn: 247276
2015-09-10 12:54:54 +00:00
Hans Wennborg d2799a963f Revert r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes"
This caused build breakges, e.g.
http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926

llvm-svn: 247226
2015-09-10 00:57:26 +00:00
Ahmed Bougacha 37bffd83f0 [CodeGen] Make x86 nontemporal store patfrags generic. NFC.
To be used by other targets.

llvm-svn: 247225
2015-09-10 00:53:15 +00:00
Reid Kleckner 7878391208 [WinEH] Add codegen support for cleanuppad and cleanupret
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.

This small example now compiles and executes successfully on win32:
  extern "C" int printf(const char *, ...) noexcept;
  struct Dtor {
    ~Dtor() { printf("~Dtor\n"); }
  };
  void has_cleanup() {
    Dtor o;
    throw 42;
  }
  int main() {
    try {
      has_cleanup();
    } catch (int) {
      printf("caught it\n");
    }
  }

Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.

llvm-svn: 247219
2015-09-10 00:25:23 +00:00
Hans Wennborg 6fa09455ed Fix Clang-tidy misc-use-override warnings, other minor fixes
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D12740

llvm-svn: 247216
2015-09-10 00:12:56 +00:00
Reid Kleckner 94b704c469 [SEH] Emit 32-bit SEH tables for the new EH IR
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.

The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.

llvm-svn: 247192
2015-09-09 21:10:03 +00:00
Renato Golin db7ea86bf4 Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding."
This reverts commit r247149, as it was breaking numerous buildbots of varied architectures.

llvm-svn: 247177
2015-09-09 19:44:40 +00:00