Commit Graph

62 Commits

Author SHA1 Message Date
Reid Kleckner 66e7717b46 Revert "[X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms"
This reverts commit r278783.  It breaks usage of _xgetbv on Windows.

llvm-svn: 278814
2016-08-16 16:04:14 +00:00
Marina Yatsina 197b65f833 [X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms
commit on behalf of guyblank

Differential Revision: https://reviews.llvm.org/D21959

llvm-svn: 278783
2016-08-16 08:13:36 +00:00
Simon Pilgrim e3b9ee0645 [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

Differential Revision: https://reviews.llvm.org/D22105

llvm-svn: 276102
2016-07-20 10:18:01 +00:00
Craig Topper a1bee4398c [X86] Remove dead builtins that don't exist in the backend intrinsic file and don't have custom handling in CGBuiltins.cpp either.
llvm-svn: 274825
2016-07-08 05:11:47 +00:00
Simon Pilgrim beca5f295c [Clang][X86] Convert non-temporal store builtins to generic __builtin_nontemporal_store in headers
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp

The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.

The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load

Differential Revision: http://reviews.llvm.org/D21272

llvm-svn: 272540
2016-06-13 09:57:52 +00:00
Simon Pilgrim 00880511b1 [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (clang)
The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents.

Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds).

Differential Revision: http://reviews.llvm.org/D20859

llvm-svn: 271436
2016-06-01 21:46:51 +00:00
Craig Topper 09175dab31 [X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code that will compile to a native unaligned store. Remove the builtins since they are no longer used.
Intrinsics will be removed from llvm in a future commit.

llvm-svn: 271214
2016-05-30 17:10:30 +00:00
Simon Pilgrim 91b77ceaed [X86][SSE] Replace VPMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (clang)
The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.

This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics.

Note: We already did this for SSE41 PMOVSX sometime ago.

Differential Revision: http://reviews.llvm.org/D20684

llvm-svn: 271106
2016-05-28 08:12:45 +00:00
Simon Pilgrim 90770c7c76 [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.

This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.

Differential Revision: http://reviews.llvm.org/D20528

llvm-svn: 270499
2016-05-23 22:13:02 +00:00
Ashutosh Nema 51c9dd0081 Add new intrinsic support for MONITORX and MWAITX instructions
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and 
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.

The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction
execution and enter an implementation-dependent optimized state until
occurrence of a class of events.

Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.

These instructions are enabled for AMD's bdver4 architecture.

Patch by Ganesh Gopalasubramanian!

Reviewers: echristo, craig.topper

Subscribers: RKSimon, joker.eph, llvm-commits, cfe-commits

Differential Revision: http://reviews.llvm.org/D19796

llvm-svn: 269907
2016-05-18 11:56:23 +00:00
Andrea Di Biagio 8bb12d0a77 [x86] Fix maskload/store intrinsic definitions in avxintrin.h
According to the Intel documentation, the mask operand of a maskload and
maskstore intrinsics is always a vector of packed integer/long integer values.
This patch introduces the following two changes:
 1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h.
 2. It changes BuiltinsX86.def to match the correct gcc definitions for avx
    maskload/store (see D13861 for more details).

Differential Revision: http://reviews.llvm.org/D13861

llvm-svn: 250816
2015-10-20 11:19:54 +00:00
Craig Topper e33f51fa91 [X86] Add fxsr feature name for fxsave/fxrestore builtins.
llvm-svn: 250498
2015-10-16 06:22:36 +00:00
Eric Christopher 4fb4fbc5d6 Add the minimum target features that these tests depend upon.
llvm-svn: 250448
2015-10-15 20:04:40 +00:00
Amjad Aboud 2b9b8a5921 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13014

llvm-svn: 250158
2015-10-13 12:29:35 +00:00
Simon Pilgrim 12919f7e49 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
128-bit vector integer sign extensions correctly lower to the pmovsx instructions even for debug builds.

This patch removes the builtins and reimplements the _mm_cvtepi*_epi* intrinsics __using builtin_shufflevector (to extract the bottom most subvector) and __builtin_convertvector (to actually perform the sign extension).

Differential Revision: http://reviews.llvm.org/D12835

llvm-svn: 248092
2015-09-19 15:12:38 +00:00
Simon Pilgrim a79dfb3223 [X86] Add __builtin_ia32_undef* intrinsics to test
Minor tweak to rL246083

llvm-svn: 246200
2015-08-27 20:29:13 +00:00
Michael Kuperstein a3c7b74208 [X86] Add FXSR intrinsics
Add intrinsics for the FXSR instructions (FXSAVE/FXSAVE64/FXRSTOR/FXRSTOR64)

These were previously declared in Intrin.h for MSVC compatibility, but now
that we have them implemented, these declarations can be removed.

llvm-svn: 241053
2015-06-30 09:45:38 +00:00
Sanjay Patel 0c351aba25 [X86, AVX] replace vextractf128 intrinsics with generic shuffles
This is very much like D8088 (checked in at r231792).

Now that we've replaced the vinsertf128 intrinsics,
do the same for their extract twins.

Differential Revision: http://reviews.llvm.org/D8275

llvm-svn: 232052
2015-03-12 15:50:36 +00:00
Sanjay Patel 7f6aa52e93 [X86, AVX] Replace vinsertf128 intrinsics with generic shuffles.
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086

Differential Revision: http://reviews.llvm.org/D8088

llvm-svn: 231792
2015-03-10 15:19:26 +00:00
Craig Topper b1bc5cf4bc [X86] Remove pblendw and pblendd builtins that aren't being used by the intrinsic headers.
llvm-svn: 230738
2015-02-27 06:54:25 +00:00
Craig Topper ac0d58bc4c [X86] Remove the blendps/blendpd builtins. They aren't used by the intrinsic headers. We use appropriate shuffle vector instead.
llvm-svn: 230616
2015-02-26 08:09:05 +00:00
Craig Topper 1601525f1c [X86] Add range checking to the immediate arguments of many of the SSE/AVX builtins.
llvm-svn: 227674
2015-01-31 06:31:23 +00:00
Chandler Carruth 2949e548f4 [x86] Clean up the x86 builtin specs to reflect r217310 in LLVM which
made the 8-bit masks actually 8-bit arguments to these intrinsics.

These builtins are a mess. Many were missing the I qualifier which
I added where obviously correct. Most aren't tested, but I've updated
the relevant tests. I've tried to catch all the things that should
become 'c' in this round.

It's also frustrating because the set of these is really ad-hoc and
doesn't really map that cleanly to the set supported by either GCC or
LLVM. Oh well...

llvm-svn: 217311
2014-09-06 10:30:51 +00:00
Andrea Di Biagio eb606a3c27 [x86] Add Clang support for intrinsic __rdpmc.
This patch adds intrinsic __rdpmc to header file 'ia32intrin.h'.
Intrinsic __rdmpc can be used to read performance monitoring counters. It is
implemented as a direct call to __builtin_ia32_rdpmc.

It takes as input a value representing the index of the performance counter to
read. The value of the performance counter is then returned as a unsigned
64-bit quantity.

llvm-svn: 212053
2014-06-30 18:23:58 +00:00
Adam Nemet 286ae08e7d Implement AVX1 vbroadcast intrinsics with vector initializers
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts).  Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.

In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM).  When neither is the case we fail to hoist a loop-invariant
broadcast.

We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax.  This
exposes the load to LICM and other optimizations.

At the IR level this is translated into a series of insertelements.  The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.

_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions.  This will be tackled in a follow-on.

There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.

Fixes <rdar://problem/16494520>

llvm-svn: 209846
2014-05-29 20:47:29 +00:00
Andrea Di Biagio 7ceec07cf6 [X86] Add Clang support for intrinsics __rdtsc and __rdtscp.
This patch:
 1. Adds a definition for two new GCCBuiltins in BuiltinsX86.def:
   __builtin_ia32_rdtsc;
   __builtin_ia32_rdtscp;

 2. Replaces the already existing definition of intrinsic __rdtsc in
    ia32intrin.h with a simple call to the new GCC builtin __builtin_ia32_rdtsc.

 3. Adds a definition for the new intrinsic __rdtscp in ia32intrin.h

llvm-svn: 207132
2014-04-24 18:26:35 +00:00
Eli Friedman f9d8c6cebb Add _mm_stream_si64 intrinsic.
While I'm here, also fix the alignment computation for the whole family of
intrinsics.

PR17298.

llvm-svn: 191243
2013-09-23 23:38:39 +00:00
Ben Langmuir 58078d0103 Add C intrinsics for Intel SHA Extensions
Intrinsics added shaintrin.h, which is included from x86intrin.h if __SHA__ is
enabled. SHA implies SSE2, which is needed for the __m128i type.

Also add the -msha/-mno-sha option.

llvm-svn: 190999
2013-09-19 13:22:04 +00:00
Chad Rosier 87622b8b84 Get rid of storelv4si builtin as it can be expressed directly. This is general
goodness because it provides opportunites to cleanup things.  For example,

uint64_t t1(__m128i vA)
{
  uint64_t Alo;
  _mm_storel_epi64((__m128i*)&Alo, vA);
  return Alo;
}

was generating 

	movq	%xmm0, -8(%rbp)
	movq	-8(%rbp), %rax

and now generates

	movd	%xmm0, %rax

rdar://11282581

llvm-svn: 155924
2012-05-01 18:11:51 +00:00
Craig Topper 26e74e50b6 Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector.
llvm-svn: 154906
2012-04-17 05:16:56 +00:00
Craig Topper e5ea3b0239 Remove vperm2f* and vperm2i builtins. Same effect can be achieved with builtin_shufflevector.
llvm-svn: 150064
2012-02-08 07:33:36 +00:00
Craig Topper fec9f8edb7 Remove vpermilp* builtins. Same effect can be achieved with builtin_shufflevector.
llvm-svn: 150056
2012-02-08 05:16:54 +00:00
Craig Topper d6d3a05b4f Cleanup 3dnow builtin handling. Most of them were already handled by LLVM connecting intrinsics and builtins in IntrinsicsX86.td.
llvm-svn: 149233
2012-01-30 08:18:19 +00:00
Craig Topper 80df922f2f Re-enable test that was broken by r148919
llvm-svn: 148932
2012-01-25 06:23:23 +00:00
Chris Lattner f07b612313 disable this test for now.
llvm-svn: 148928
2012-01-25 05:38:06 +00:00
Craig Topper a89747dd1e Add AVX2 intrinsics for pavg, pblend, and pcmp instructions. Also remove unneeded builtins for SSE pcmp. Change SSE pcmpeqq and pcmpgtq to not use builtins and just use vector == and >.
llvm-svn: 146969
2011-12-20 09:55:26 +00:00
Chad Rosier 060d03be1c Fix _mm256_round_pd, _mm256_round_ps, _mm_permute_pd and _mm256_permute_pd AVX
intrinsics to use "I" (ICE) markings.  Fix avxintrin.h to take them into 
account.
Part of rdar://10595450

llvm-svn: 146791
2011-12-17 00:15:26 +00:00
Bill Wendling bb455154a1 Remove the 'unaligned load' builtins now that they're no longer used in the *mmintrin.h files.
llvm-svn: 131300
2011-05-13 18:52:28 +00:00
Bill Wendling e106c34817 LLVM doesn't always optimize away the four loads from this:
(__m128){ p[0], p[1], p[2], p[3] }

which produces really bad code. This could be done in instcombine, but it's
probably better to do it in the front-end instead.
<rdar://problem/9424836>

llvm-svn: 131237
2011-05-12 19:02:15 +00:00
Michael J. Spencer 6826eb816a Add 3DNow! Intrinsics.
llvm-svn: 129570
2011-04-15 15:07:13 +00:00
Bill Wendling a865185ad6 Removing the unaligned load tests from builtins-x86.c since they're generated by a regular 'load' now.
llvm-svn: 129464
2011-04-13 20:17:22 +00:00
Argyrios Kyrtzidis 073c9cb592 Implement __builtin_ia32_vec_ext_v2si function (required by Qt).
llvm-svn: 116162
2010-10-10 03:19:11 +00:00
Bruno Cardoso Lopes 762e401911 Remove rsqrtps_nr256 and sqrtps_nr256 builtins, at least until we need them
llvm-svn: 110844
2010-08-11 19:18:36 +00:00
Bruno Cardoso Lopes 65954ffc69 Remove 256-bit cast built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110771
2010-08-11 02:14:38 +00:00
Bruno Cardoso Lopes a4f1930b75 Remove 256-bit unpack built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110768
2010-08-11 01:43:24 +00:00
Bruno Cardoso Lopes e712a135b7 Remove 256-bit shuffle built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110766
2010-08-11 01:17:34 +00:00
Bruno Cardoso Lopes 3d3fc1d075 Make replicate intrinsics use shufflevector instead of dup builtins, also remove the dup builtins
llvm-svn: 110646
2010-08-10 02:23:54 +00:00
Bruno Cardoso Lopes e2538c4ecf We don't want to support built-ins which aren't needed by the intrinsics. Remove them
llvm-svn: 110399
2010-08-05 23:47:43 +00:00
Bruno Cardoso Lopes 6586724f71 Add more AVX 256-bit intrinsics and test cases for them
llvm-svn: 110178
2010-08-04 01:11:26 +00:00
Bruno Cardoso Lopes 1f927ccaa2 Support x86 AVX 256-bit instructions built-ins. Right now support all of them, but
as soon as we properly codegen the simple vector operations, remove the
unnecessary built-ins/intrinsics from clang and llvm. Also add tests for the new
built-ins

llvm-svn: 110096
2010-08-03 01:57:18 +00:00