llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f8e0e5db48	[X86] Enable constexpr on _cast fp<-> uint intrinsics (PR31446) As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes. Differential Revision: https://reviews.llvm.org/D86398	2020-08-23 10:27:46 +01:00
Simon Pilgrim	42b993d97d	[X86] ia32intrin.h - pull out common attributes used in cast helpers into define. NFCI.	2020-08-22 15:25:15 +01:00
Simon Pilgrim	9ffc412e1a	[X86] Enable constexpr on BITSCAN intrinsics (PR31446) This enables constexpr BSF/BSR intrinsics defined in ia32intrin.h	2020-08-21 11:44:20 +01:00
Simon Pilgrim	c8e6bf0a65	[X86] Enable constexpr on BSWAP intrinsics (PR31446) This enables constexpr BSWAP intrinsics defined in ia32intrin.h	2020-08-21 10:55:15 +01:00
Simon Pilgrim	c6863a4ab8	[X86] Enable constexpr on POPCNT intrinsics (PR31446) Followup to D86229, this enables constexpr on the alternative (which fallback to generic code) POPCNT intrinsics defined in ia32intrin.h	2020-08-21 10:20:37 +01:00
Simon Pilgrim	33bb80bc7a	[X86] ia32intrin.h - pull out common attributes into defines. NFCI. Matches what we do in most other x86 headers	2020-08-21 10:03:28 +01:00
Simon Pilgrim	cff0db0876	[X86] Enable constexpr on POPCNT intrinsics (PR31446) This is a first step patch to enable constexpr support and testing to a large number of x86 intrinsics. All I've done here is provide a DEFAULT_FN_ATTRS_CONSTEXPR variant to our existing DEFAULT_FN_ATTRS tag approach that adds constexpr on c++ builds. The clang cuda headers do something similar. I've started with POPCNT mainly as its tiny and are wrappers to generic __builtin_* intrinsics which already act as constexpr. Differential Revision: https://reviews.llvm.org/D86229	2020-08-20 21:38:04 +01:00
Amy Kwan	c7ec3a7e33	[PowerPC] Implement Vector Extract Mask builtins in LLVM/Clang This patch implements the vec_extractm function prototypes in altivec.h in order to utilize the vector extract with mask instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82675	2020-08-17 21:14:17 -05:00
Albion Fung	3136cbe29e	[PowerPC] Implement Vector Shift Builtins This patch implements the builtins for the vector shifts (shl, srl, sra), and adds the appropriate test cases for these builtins. The builtins utilize the vector shift instructions introduced within ISA 3.1. Differential Revision: https://reviews.llvm.org/D83338	2020-08-12 18:26:58 -05:00
Yaxun (Sam) Liu	ac3e720dc1	Make clang HIP headers compatible with C++98 Automation to detect compiler features, such as CMake's target_compile_features, would attempt to detect compiler features by explicitly using langugage flags. This change ensures that the HIP headers would still work with C++98. Patch by Siu Chi Chan Differential Revision: https://reviews.llvm.org/D85471 Change-Id: I304e964b18a525b0fde55efd841da74b6c4dc8ed	2020-08-07 13:50:22 -04:00
biplmish	cce1b0e891	[PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D84622	2020-08-07 01:02:29 -05:00
Thomas Lively	f496950001	[WebAssembly] Fix types in wasm_simd128.h and add tests `47f7174ffa` changed the types used in the Wasm SIMD builtin functions, but not all of their uses in wasm_simd128.h were updated. This commit fixes wasm_simd128.h and adds tests to make sure similar problems do not pass uncaught in the future. Differential Revision: https://reviews.llvm.org/D85347	2020-08-05 14:00:01 -07:00
Artem Belevich	7d057efddc	[CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA. Normally math functions are forwarded to __nv_* counterparts provided by CUDA's libdevice bitcode. However, __nv_rint()/__nv_nearbyint() functions there have a bug -- they use round() which rounds up instead of rounding towards the nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected 2.0. The broken bitcode is not actually used by NVCC itself, which has both a work-around in CUDA headers and, in recent versions, uses correct implementations in NVCC's built-ins. This patch implements equivalent workaround and directs rint/nearbyint to __builtin_* variants that produce correct results. Differential Revision: https://reviews.llvm.org/D85236	2020-08-05 13:13:48 -07:00
Dan Gohman	47f7174ffa	[WebAssembly] Use "signed char" instead of "char" in SIMD intrinsics. This allows people to use `int8_t` instead of `char`, -funsigned-char, and generally decouples SIMD from the specialness of `char`. And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16` and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned element types, respectively. Differential Revision: https://reviews.llvm.org/D85074	2020-08-04 12:48:40 -07:00
Amy Kwan	c4e5743232	[PowerPC] Implement low-order Vector Modulus Builtins, and add Vector Multiply/Divide/Modulus Builtins Tests Power10 introduces new instructions for vector multiply, divide and modulus. These instructions can be exploited by the builtin functions: vec_mul, vec_div, and vec_mod, respectively. This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div been previously implemented in altivec.h. This patch also adds the following front end tests: vec_mul for v2i64 vec_div for v4i32 and v2i64 vec_mod for v4i32 and v2i64 Differential Revision: https://reviews.llvm.org/D82576	2020-07-31 10:58:07 -05:00
Thomas Lively	11bb7eef41	[WebAssembly] Remove intrinsics for SIMD widening ops Instead, pattern match extends of extract_subvectors to generate widening operations. Since extract_subvector is not a legal node, this is implemented via a custom combine that recognizes extract_subvector nodes before they are legalized. The combine produces custom ISD nodes that are later pattern matched directly, just like the intrinsic was. Also removes the clang builtins for these operations since the instructions can now be generated from portable code sequences. Differential Revision: https://reviews.llvm.org/D84556	2020-07-28 18:25:55 -07:00
Amy Kwan	74790a5dde	[PowerPC] Implement Truncate and Store VSX Vector Builtins This patch implements the `vec_xst_trunc` function in altivec.h in order to utilize the Store VSX Vector Rightmost [byte \| half \| word \| doubleword] Indexed instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82467	2020-07-24 19:22:39 -05:00
Jon Chesterfield	679158e662	Make hip math headers easier to use from C Summary: Make hip math headers easier to use from C Motivation is a step towards using the hip math headers to implement math.h for openmp, which needs to work with C as well as C++. NFC for C++ code. Reviewers: yaxunl, jdoerfert Reviewed By: yaxunl Subscribers: sstefan1, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D84476	2020-07-24 20:50:46 +01:00
Yaxun (Sam) Liu	ce04d4e39c	Fix pow and ldexp in HIP header	2020-07-21 17:39:46 -04:00
Amy Kwan	62f5ba624b	[PowerPC][Power10] Implement Test LSB by Byte Builtins in LLVM/Clang This patch implements builtins for the Test LSB by Byte instruction introduced in Power10. Differential Revision: https://reviews.llvm.org/D82431	2020-07-13 22:47:47 -05:00
Johannes Doerfert	b5667d00e0	[OpenMP][CUDA] Fix std::complex in GPU regions The old way worked to some degree for C++-mode but in C mode we actually tried to introduce variants of macros (e.g., isinf). To make both modes work reliably we get rid of those extra variants and directly use NVIDIA intrinsics in the complex implementation. While this has to be revisited as we add other GPU targets which want to reuse the code, it should be fine for now. Reviewed By: tra, JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D83591	2020-07-11 00:40:05 -05:00
Johannes Doerfert	7f1e6fcff9	[OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers Due to recent changes we cannot use OpenMP in CUDA files anymore (PR45533) as the math handling of CUDA is different when _OPENMP is defined. We actually want this different behavior only if we are offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch we do not interfere with the CUDA math handling except if we are in NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D78155	2020-07-10 18:53:34 -05:00
Johannes Doerfert	e3e47e8035	[OpenMP] Make complex soft-float functions on the GPU weak definitions To avoid linkage errors we have to ensure the linkage allows multiple definitions of these compiler inserted functions. Since they are on the cold path of complex computations, we want to avoid `inline`. Instead, we opt for `weak` and `noinline` for now.	2020-07-09 01:06:55 -05:00
Johannes Doerfert	d999cbc988	[OpenMP] Initial support for std::complex in target regions This simply follows the scheme we have for other wrappers. It resolves the current link problem, e.g., `__muldc3 not found`, when std::complex operations are used on a device. This will not allow complex make math function calls to work properly, e.g., sin, but that is more complex (pan intended) anyway. Reviewed By: tra, JonChesterfield Differential Revision: https://reviews.llvm.org/D80897	2020-07-08 17:33:59 -05:00
Craig Topper	82206e7fb4	[X86] Enabled a bunch of 64-bit Interlocked* functions intrinsics on 32-bit Windows to match recent MSVC This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets. Fixes PR46595 Differential Revision: https://reviews.llvm.org/D83254	2020-07-08 10:39:56 -07:00
Xiang1 Zhang	939d8309db	[X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111	2020-07-07 10:13:40 +08:00
Biplob Mishra	0c6b6e28e7	[PowerPC] Implement Vector Splat Immediate Builtins in Clang Implements builtins for the following prototypes: vector signed int vec_splati (const signed int); vector float vec_splati (const float); vector double vec_splatid (const float); vector signed int vec_splati_ins (vector signed int, const unsigned int, const signed int); vector unsigned int vec_splati_ins (vector unsigned int, const unsigned int, const unsigned int); vector float vec_splati_ins (vector float, const unsigned int, const float); Differential Revision: https://reviews.llvm.org/D82520	2020-07-06 20:29:33 -05:00
Lei Huang	e359ab1eca	[PowerPC][NFC] Fix indentation	2020-07-03 16:47:24 -05:00
Biplob Mishra	0939e04e41	[PowerPC] Implement Vector Insert Builtins in LLVM/Clang Implements vec_insertl() and vec_inserth(). Differential Revision: https://reviews.llvm.org/D82365	2020-07-03 15:30:41 -05:00
Biplob Mishra	ca464639a1	[PowerPC] Implement Vector Blend Builtins in LLVM/Clang Implements vec_blendv() Differential Revision: https://reviews.llvm.org/D82774	2020-07-02 16:52:52 -05:00
Biplob Mishra	286073484f	[PowerPC]Implement Vector Permute Extended Builtin Implements vector permute builtin: vec_permx() Differential Revision: https://reviews.llvm.org/D82869	2020-07-02 14:53:18 -05:00
Biplob Mishra	88874f0746	[PowerPC]Implement Vector Shift Double Bit Immediate Builtins Implement Vector Shift Double Bit Immediate Builtins in LLVM/Clang. * vec_sldb (); * vec_srdb (); Differential Revision: https://reviews.llvm.org/D82440	2020-07-01 20:34:53 -05:00
Amy Kwan	e0c02dc980	[PowerPC][Power10] Implement centrifuge, vector gather every nth bit, vector evaluate Builtins in LLVM/Clang This patch implements builtins for the following prototypes: unsigned long long __builtin_cfuged (unsigned long long, unsigned long long); vector unsigned long long vec_cfuge (vector unsigned long long, vector unsigned long long); unsigned long long vec_gnb (vector unsigned __int128, const unsigned int); vector unsigned char vec_ternarylogic (vector unsigned char, vector unsigned char, vector unsigned char, const unsigned int); vector unsigned short vec_ternarylogic (vector unsigned short, vector unsigned short, vector unsigned short, const unsigned int); vector unsigned int vec_ternarylogic (vector unsigned int, vector unsigned int, vector unsigned int, const unsigned int); vector unsigned long long vec_ternarylogic (vector unsigned long long, vector unsigned long long, vector unsigned long long, const unsigned int); vector unsigned __int128 vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, vector unsigned __int128, const unsigned int); Differential Revision: https://reviews.llvm.org/D80970	2020-06-25 21:34:41 -05:00
Amy Kwan	d82f26cc4b	[PowerPC][Power10] Implement Count Leading/Trailing Zeroes Builtins under bit Mask in LLVM/Clang This patch implements builtins for the following prototypes: unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long) unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long) vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long) vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long) Differential Revision: https://reviews.llvm.org/D80941	2020-06-24 16:03:45 -05:00
Amy Kwan	19df9e2959	[PowerPC][Power10] Implement VSX PCV Generate Operations in LLVM/Clang This patch implements builtins for the following prototypes for the VSX Permute Control Vector Generate with Mask Instructions: vector unsigned char vec_genpcvm (vector unsigned char, const int); vector unsigned short vec_genpcvm (vector unsigned short, const int); vector unsigned int vec_genpcvm (vector unsigned int, const int); vector unsigned long long vec_genpcvm (vector unsigned long long, const int); Differential Revision: https://reviews.llvm.org/D81774	2020-06-22 21:09:34 -05:00
Amy Kwan	cc95635b1b	[PowerPC][Power10] Implement Vector Clear Left/Rightmost Bytes Builtins in LLVM/Clang This patch implements builtins for the following prototypes: ``` vector signed char vec_clrl (vector signed char a, unsigned int n); vector unsigned char vec_clrl (vector unsigned char a, unsigned int n); vector signed char vec_clrr (vector signed char a, unsigned int n); vector signed char vec_clrr (vector unsigned char a, unsigned int n); ``` Differential Revision: https://reviews.llvm.org/D81707	2020-06-20 18:29:16 -05:00
Amy Kwan	c45c161130	[PowerPC][Power10] Implement Parallel Bits Deposit/Extract Builtins in LLVM/Clang This patch implements builtins for the following prototypes: vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long); vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b); unsigned long long __builtin_pdepd (unsigned long long, unsigned long long); unsigned long long __builtin_pextd (unsigned long long, unsigned long long); Revision Depends on D80758 Differential Revision: https://reviews.llvm.org/D80935	2020-06-18 16:23:56 -05:00
Thomas Lively	d2c394e74f	[WebAssembly] Add intrinsic for i64x2.mul Summary: This instruction was implemented in `3181273be7`, but that commit did not add an intrinsic for it. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D81757	2020-06-12 14:08:18 -07:00
Yaxun (Sam) Liu	af00eb25f8	Fix __clang_cuda_math_forward_declares.h Recent change from `#if !defined(__CUDA__)` to `#if !__CUDA__` caused regression on ROCm 3.5 since there is `#define __CUDA__` before inclusion of the header file, which causes `#if !__CUDA__` to be invalid. Change `#if !__CUDA__` back to `#if !defined(__CUDA__)` for backward compatibility.	2020-06-10 23:47:13 -04:00
Yaxun (Sam) Liu	4615abc11f	Rename arg name in __clang_hip_math.h __sptr is a keyword for ms-extension. Change it from __sptr to __sinptr.	2020-06-08 14:13:32 -04:00
Yaxun (Sam) Liu	8422bc9efc	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-06 14:21:22 -04:00
Nico Weber	2920348063	Revert "recommit "[HIP] Add default header and include path"" This reverts commit `1fa43e0b34`. Still breaks tests on several bots, see https://reviews.llvm.org/D81176	2020-06-05 21:50:04 -04:00
Yaxun (Sam) Liu	1fa43e0b34	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu	8a8c6913a9	Revert "[HIP] Add default header and include path" This reverts commit `11d06b9511`.	2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu	11d06b9511	[HIP] Add default header and include path To support std::complex and some other standard C/C++ functions in HIP device code, they need to be forced to be __host__ __device__ functions by pragmas. This is done by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang. For these standard C++ wapper headers to work properly, specific include path order has to be enforced: clang C++ wrapper include path standard C++ include path clang include path Also, these C++ wrapper headers require device version of some standard C/C++ functions must be declared before including them. This needs to be done by including a default header which declares or defines these device functions. The default header is always included before any other headers are included by users. This patch adds the the default header and include path for HIP. Differential Revision: https://reviews.llvm.org/D81176	2020-06-05 12:44:57 -04:00
Ties Stuij	a6fcf5ca03	[clang][BFloat] add NEON emitter for bfloat Summary: This patch adds the bfloat16_t struct typedefs (e.g. bfloat16x8x2_t) to arm_neon.h This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Luke Cheeseman - Simon Tatham - Ties Stuij Reviewers: t.p.northover, fpetrogalli, sdesmalen, az, LukeGeeson Reviewed By: fpetrogalli Subscribers: SjoerdMeijer, LukeGeeson, pbarrio, mgorny, kristof.beyls, ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D79708	2020-06-05 14:11:51 +01:00
Anastasia Stulova	4a4402f0d7	[OpenCL] Add cl_khr_extended_subgroup extensions. Added extensions and their function declarations into the standard header. Patch by Piotr Fusik! Tags: #clang Differential Revision: https://reviews.llvm.org/D79781	2020-06-04 13:29:30 +01:00
Thomas Lively	237be3404b	[WebAssembly] Improve macro hygiene in wasm_simd128.h Summary: The shuffle intrinsic macros did not parenthesize usages of their constant parameters, which could lead to incorrect results due to operator precedence issues. This patch fixes the problem by adding the missing paretheses. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D80968	2020-06-02 12:55:06 -07:00
Craig Topper	1b02db52b7	[X86] Update some av512 shift intrinsics to use "unsigned int" parameter instead of int to match Intel documentation There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz. Fixes PR45931 Differential Revision: https://reviews.llvm.org/D80251	2020-05-22 20:12:57 -07:00
Thomas Lively	3181273be7	[WebAssembly] Implement i64x2.mul and remove i8x16.mul Summary: This reflects changes in the spec proposal made since basic arithmetic was first implemented. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80174	2020-05-19 12:50:44 -07:00
Xiang1 Zhang	bcc0c894f3	Add cet.h for writing CET-enabled assembly code Summary: Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused. This file is mainly design for assembly source code which want to enable CET Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith Reviewed By: LuoYuanke Subscribers: cfe-commits, mgorny Tags: #clang Differential Revision: https://reviews.llvm.org/D79617	2020-05-19 14:03:17 +08:00
Xiang1 Zhang	62a9eca859	Test asm-cet.S fail for window clang This reverts commit `e7e84ff24a`.	2020-05-19 13:18:05 +08:00
Xiang1 Zhang	e7e84ff24a	Add cet.h for writing CET-enabled assembly code Summary: Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused. This file is mainly design for assembly source code which want to enable CET Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith Reviewed By: LuoYuanke Subscribers: mgorny Differential Revision: https://reviews.llvm.org/D79617	2020-05-19 10:37:46 +08:00
Thomas Lively	c702d4bf41	[WebAssembly] Update latest implemented SIMD instructions Summary: Move instructions that have recently been implemented in V8 from the `unimplemented-simd128` target feature to the `simd128` target feature. The updated instructions match the update at https://github.com/WebAssembly/simd/pull/223. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D79973	2020-05-15 10:53:02 -07:00
Thomas Lively	3d49d1cfa7	[WebAssembly] Implement pseudo-min/max SIMD instructions Summary: As proposed in https://github.com/WebAssembly/simd/pull/122. Since these instructions are not yet merged to the SIMD spec proposal, this patch makes them entirely opt-in by surfacing them only through LLVM intrinsics and clang builtins. If these instructions are made official, these intrinsics and builtins should be replaced with simple instruction patterns. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D79742	2020-05-12 09:39:01 -07:00
Thomas Lively	8e3e56f2a3	[WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic Summary: Although using `__builtin_shufflevector` and the `shufflevector` instruction works fine, they are not opaque to the optimizer. As a result, DAGCombine can potentially reduce the number of shuffles and change the shuffle masks. This is unexpected behavior for users of the WebAssembly SIMD intrinsics who have crafted their shuffles to optimize the code generated by engines. This patch solves the problem by adding a new shuffle intrinsic that is opaque to the optimizers in line with the decision of the WebAssembly SIMD contributors at https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In the future we may implement custom DAG combines to properly optimize shuffles and replace this solution. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D66983	2020-05-11 10:01:55 -07:00
Thomas Lively	e0f52842c8	[WebAssembly] Renumber SIMD opcodes Summary: As described in https://github.com/WebAssembly/simd/pull/209. This is the final reorganization of the SIMD opcode space before standardization. It has been landed in concert with corresponding changes in other projects in the WebAssembly SIMD ecosystem. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79224	2020-05-01 17:20:49 -07:00
Craig Topper	af28e02e74	[clang] Add vendor identity for Hygon Dhyana processor to cpuid.h The vendor id is used to determine whether the processor supports hardware CRC32 in the Scudo code. The previous discussion about the patch is in [1], and more information about Hygon Dhyana processor is in[2]. [1]: https://reviews.llvm.org/D62368 [2]: https://git.kernel.org/torvalds/c/c9661c1e80b609cd038db7c908e061f0535804ef Patch by fanjinke (Jinke Fan) Differential Revision: https://reviews.llvm.org/D78874	2020-04-30 18:17:01 -07:00
Douglas Yung	046130490f	Add header guards for header files that should not be included on the PS4 platform. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D79194	2020-04-30 16:17:34 -07:00
Ulrich Weigand	095ccf4455	[SystemZ] Avoid __INTPTR_TYPE__ conversions in vecintrin.h Some intrinsics in vecintrin.h are currently implemented by performing address arithmetic in __INTPTR_TYPE__ and converting the result to some pointer type. While this works correctly, it leads to suboptimal code generation since many optimizers cannot trace the provenance of the resulting pointers. Fixed by using "char *" pointer arithmetic instead.	2020-04-28 18:49:49 +02:00
Ulrich Weigand	c90e09b13c	[SystemZ] Use reserved keywords in vecintrin.h System headers should avoid using the "vector" and "bool" keywords since those might be redefined by user code. For example, using <stdbool.h> before <vecintrin.h> will currently lead to compiler errors. Fixed by using the reserved "__vector" and "__bool" keywords instead. NFC otherwise.	2020-04-28 18:49:48 +02:00
Raul Tambre	8e20516540	[CUDA] Define __CUDACC__ before standard library headers libstdc++ since version 7 when GNU extensions are enabled (e.g. -std=gnu++11) use it to avoid defining overloads using `__float128`. This fixes compiling with GNU extensions failing due to `__float128` being used. Discovered at https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4442#note_737136. Differential Revision: https://reviews.llvm.org/D78392	2020-04-17 12:56:13 -07:00
Johannes Doerfert	17d8334223	[OpenMP] Allow <math.h> to go first in C++-mode in target regions If we are in C++ mode and include <math.h> (not <cmath>) first, we still need to make sure <cmath> is read first. The problem otherwise is that we haven't seen the declarations of the math.h functions when the system math.h includes our cmath overlay. However, our cmath overlay, or better the underlying overlay, e.g. CUDA, uses the math.h functions. Since we haven't declared them yet we get errors. CUDA avoids this by eagerly declaring all math functions (in the __device__ space) but we cannot do this. Instead we break the dependence by forcing cmath to go first. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D77774	2020-04-09 22:10:31 -05:00
WangTianQing	a3dc949000	[X86] Add TSXLDTRK instructions. Summary: For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Reviewers: craig.topper, RKSimon, LuoYuanke Reviewed By: craig.topper Subscribers: mgorny, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77205	2020-04-09 13:17:29 +08:00
Johannes Doerfert	f85ae058f5	[OpenMP] Provide math functions in OpenMP device code via OpenMP variants For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions, we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope. This way, the vendor specific math functions will become specialized versions of the system math functions. When a system math function is called and specialized version is available the selection logic introduced in D75779 instead call the specialized version. In contrast to the code path we used so far, the system header is actually included. This means functions without specialized versions are available and so are macro definitions. This should address PR42061, PR42798, and PR42799. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D75788	2020-04-07 23:33:24 -05:00
Pierre Gousseau	08fab9ebec	[X86] Fix implicit sign conversion warnings in X86 headers. Warnings in emmintrin.h and xmmintrin.h are reported by -fsanitize=implicit-integer-sign-change. Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D77393	2020-04-07 11:25:08 +01:00
WangTianQing	d08fadd662	[X86] Add SERIALIZE instruction. Summary: For more details about this instruction, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Reviewers: craig.topper, RKSimon, LuoYuanke Reviewed By: craig.topper Subscribers: mgorny, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77193	2020-04-02 16:19:23 +08:00
Johannes Doerfert	b0b5f0416b	[OpenMP][FIX] Undo changes accidentally already introduced in NFC commit In `d1705c1196` (D77238) we accidentally included subsequent changes and did not only move the code into a new file (which was the intention). We undo the changes now and re-introduce them with the appropriate test changes later.	2020-04-02 01:33:39 -05:00
Johannes Doerfert	410cfc478f	[OpenMP][FIX] Add second include after header was split in `d1705c1196` The math wrapper handling is going to be replaced shortly and `d1705c1196` was actually a precursor for that.	2020-04-02 00:20:23 -05:00
Johannes Doerfert	d1705c1196	[CUDA][NFC] Split math.h functions out of __clang_cuda_device_functions.h This is not supported to change anything but allow us to reuse the math functions separately from the device functions, e.g., source them at different times. This will be used by the OpenMP overlay. This also adds two `return` keywords that were missing. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D77238	2020-04-01 23:46:27 -05:00
Thomas Lively	95fac2e46b	[WebAssembly] Rename SIMD min/max/avgr intrinsics for consistency Summary: The convention for the wasm_simd128.h intrinsics is to have the integer sign in the lane interpretation rather than as a suffix. This PR changes the names of the integer min, max, and avgr intrinsics to match this convention. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77185	2020-04-01 09:38:41 -07:00
Thomas Lively	5074776de4	[WebAssembly] Import wasm_simd128.h from Emscripten Summary: As the WebAssembly SIMD proposal nears stabilization, there is desire to use it with toolchains other than Emscripten. Moving the intrinsics header to clang will make it available to WASI toolchains as well. Reviewers: aheejin, sunfish Subscribers: dschuff, mgorny, sbc100, jgravelle-google, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76959	2020-03-30 17:04:18 -07:00
Michael Liao	5be9b8cbe2	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Re-commit after fix Sema checks on partial template specialization. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 17:18:49 -04:00
Artem Belevich	fe8063e1a0	Revert "[cuda][hip] Add CUDA builtin surface/texture reference support." This reverts commit `6a9ad5f3f4`. The patch breaks CUDA copmilation. Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 10:01:38 -07:00
Michael Liao	6a9ad5f3f4	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the compilation issue for code using `tex2D` with texture references. For better compatibility, this patch proposes the support of surface/texture references. - Due to the absent documentation and magic headers, it's believed that `nvcc` does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with `device_builtin_surface_type`/`device_builtin_texture_type`, with the corresponding handle-like object types, `cudaSurfaceObject_t` or `cudaTextureObject_t`, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides. - In this patch, the following changes are proposed to support that behavior: + Refine `device_builtin_surface_type` and `device_builtin_texture_type` attributes to be applied on `Type` decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, `nvvm.texsurf.handle.internal` should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing. --- [^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf [^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll [^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf). [^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-26 14:44:52 -04:00
Sander de Smalen	5087ace651	[Clang][SVE] Parse builtin type string for scalable vectors This patch adds 'q' to mean 'scalable vector' in the builtin type string, and for SVE will return the matching builtin type as defined in the C/C++ language extensions for SVE. This patch also adds some scaffolding to generate the arm_sve.h header file, and some builtin definitions (+CodeGen) to be able to implement some simple masked load intrinsics that use the ACLE types, such as: svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) { return svld1_s8(pg, base); } Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin Reviewed By: efriedma Tags: #clang Differential Revision: https://reviews.llvm.org/D75298	2020-03-15 14:34:52 +00:00
Shengchen Kan	214d24e1f8	[X86] Support intrinsic _mm_broadcastsi128_si256 Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei Reviewed By: craig.topper Subscribers: cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D75897	2020-03-12 10:56:39 +08:00
Shengchen Kan	ab69cd0779	[X86] Support intrinsic _mm_cldemote Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei Reviewed By: craig.topper Subscribers: cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D75896	2020-03-12 10:03:41 +08:00
Shengchen Kan	560aa53f8f	[X86] Support intrinsics _bextr2* Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei Reviewed By: craig.topper Subscribers: cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D75894	2020-03-12 09:26:51 +08:00
Mikhail Maltsev	47edf5bafb	[ARM,CDE] Generalize MVE intrinsics infrastructure to support CDE Summary: This patch generalizes the existing code to support CDE intrinsics which will share some properties with existing MVE intrinsics (some of the intrinsics will be polymorphic and accept/return values of MVE vector types). Specifically the patch: * Adds new tablegen backends -gen-arm-cde-builtin-def, -gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema, -gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on existing MVE backends. * Renames the '__clang_arm_mve_alias' attribute into '__clang_arm_builtin_alias' (it will be used with CDE intrinsics as well as MVE intrinsics) * Implements semantic checks for the coprocessor argument of the CDE intrinsics as well as the existing coprocessor intrinsics. * Adds one CDE intrinsic __arm_cx1 to test the above changes Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75850	2020-03-10 14:03:16 +00:00
Michael Spencer	16af23fae8	[clang][Headers] Use __has_builtin instead of _MSC_VER. arm_acle.h relied on `_MSC_VER` to determine if a given function was already defined as a builtin. This was incorrect because `-fms-extensions` enables these builtins, but is not responsible for defining `_MSC_VER` on any target. The next closest thing is `_MSC_EXTENSIONS`, which is only defined on Windows targets, but even this is suboptimal. What this conditional is actually trying to determine is if the given functions are defined as builtins, so just check that directly. I also attempted to do this for `__nop`, but in that case intrin.h, which is only includable if `_MSC_VER` is defined, has its own definition. So in that case `_MSC_VER` is correct. Differential Revision: https://reviews.llvm.org/D75719 rdar://60102353	2020-03-06 13:48:09 -08:00
Sven van Haastregt	8a37b9e617	[OpenCL] Remove spurious atomic_fetch_min/max builtins These declarations use a mix of unsigned and signed argument and return types. This is not in accordance with OpenCL v2.0 s6.13.11. Differential Revision: https://reviews.llvm.org/D74910	2020-03-02 15:56:48 +00:00
Mirko Brkusanin	5ba931a84a	[Mips] Add intrinsics for 4-byte and 8-byte MSA loads/stores. New intrinisics are implemented for when we need to port SIMD code from other arhitectures and only load or store portions of MSA registers. Following intriniscs are added which only load/store element 0 of a vector: v4i32 __builtin_msa_ldrq_w (const void , imm_n2048_2044); v2i64 __builtin_msa_ldr_d (const void , imm_n4096_4088); void __builtin_msa_strq_w (v4i32, void , imm_n2048_2044); void __builtin_msa_str_d (v2i64, void , imm_n4096_4088); Differential Revision: https://reviews.llvm.org/D73644	2020-02-11 11:47:30 +01:00
Alexey Bataev	fd3437a4f7	[OPENMP][NVPTX]Add NVPTX specific definitions for new/delete operators. Summary: To use new/delete in NVPTX code we need to define them. Implementation copied from CUDA wrappers. Reviewers: hfinkel, jdoerfert Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D73128	2020-02-05 09:57:53 -05:00
Alexey Sotkin	f780e15caf	[OpenCL] Fix support for cl_khr_mipmap_image_writes Text of the extension is available here: https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_mipmap_image.asciidoc Patch by Ilya Mashkov Differential Revision: https://reviews.llvm.org/D71460	2020-02-05 14:55:32 +03:00
Artem Belevich	12fefeef20	[CUDA] Assume the latest known CUDA version if we've found an unknown one. This makes clang somewhat forward-compatible with new CUDA releases without having to patch it for every minor release without adding any new function. If an unknown version is found, clang issues a warning (can be disabled with -Wno-cuda-unknown-version) and assumes that it has detected the latest known version. CUDA releases are usually supersets of older ones feature-wise, so it should be sufficient to keep released clang versions working with minor CUDA updates without having to upgrade clang, too. Differential Revision: https://reviews.llvm.org/D73231	2020-01-28 10:11:42 -08:00
Artem Belevich	cc14de88da	[CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>). Wrong argument order resulted in broken shfl ops for 64-bit types.	2020-01-23 13:17:52 -08:00
Craig Topper	16b9410caa	[X86] Cast to __v4hi instead of __m64 in the implementation of _mm_extract_pi16 and _mm_insert_pi16. __m64 is a vector of 1 long long. But the builtins these intrinsics are calling expect a vector of 4 shorts. Fixes PR44589	2020-01-22 16:00:23 -06:00
Ulrich Weigand	cebba7ce39	[SystemZ] Avoid unnecessary conversions in vecintrin.h Use floating-point instead of integer zero constants to avoid creating implicit conversions, which currently cause suboptimal code to be generated with -ffp-exception-behavior=strict. NFC otherwise.	2020-01-16 18:58:14 +01:00
Richard Smith	388eaa1270	Work around PR43337: don't try to use the vec_sel overloads for vector long long, since clang's <altivec.h> doesn't provide it yet!	2020-01-15 13:14:57 -08:00
Warren Ristow	7fcd9e3f70	[X86] Mark various pointer arguments in builtins as const Enabling `-Wcast-qual` identified many casts in various system headers that were dropping the `const` qualifier. Fixing those missing qualifiers pointed out that a few of the definitions of the builtins did not properly identify their arguments as `const` pointers. This commit fixes those builtin definitions, and the system header files so that they no longer drop the qualifier. Differential Revision: https://reviews.llvm.org/D71718	2019-12-19 11:42:11 -08:00
Momchil Velikov	600d123c6f	[ARM][CMSE] Add CMSE header and builtins This is patch C2 as mentioned in RFC http://lists.llvm.org/pipermail/cfe-dev/2019-March/061834.html This adds CMSE builtin functions, and introduces arm_cmse.h header which has useful macros, functions, and data types for end-users of CMSE. Patch by Javed Absar. Diferential Revision: https://reviews.llvm.org/D70817	2019-12-12 15:01:14 +00:00
Craig Topper	890c6ef1fb	[X86] Remove forward declaration of _invpcid from intrin.h. Rely on inline version from immintrin.h The forward declaration had a cdecl calling convention, but the inline version did not. This leads to a conflict if the default calling convention is not cdecl. Fix this by just removing the forward declaration. Fixes PR41503	2019-11-25 16:27:39 -08:00
Craig Topper	3cec2a17de	[X86] Fix the implementation of __readcr3/__writecr3 to work in 64-bit mode We need to use a 64-bit type in 64-bit mode so a 64-bit register will get used in the generated assembly. I've also changed the constraints to just use "r" intead of "q". "q" forces to a only an a/b/c/d register in 32-bit mode, but I see no reason that would matter here. Fixes Nico's note in PR19301 over 4 years ago. Differential Revision: https://reviews.llvm.org/D70101	2019-11-14 13:21:36 -08:00
Nemanja Ivanovic	e0407f5496	[PowerPC][Altivec] Fix offsets for vec_xl and vec_xst As we currently have it implemented in altivec.h, the offsets for these two intrinsics are element offsets. The documentation in the ABI (as well as the implementation in both XL and GCC) states that these should be byte offsets. Differential revision: https://reviews.llvm.org/D63636	2019-11-07 20:58:11 -06:00
Nemanja Ivanovic	070e4027b0	[PowerPC][Altivec] Emit correct builtin for single precision vec_all_ne We currently emit a double precision comparison instruction for this, whereas we need to emit the single precision version. Differential revision: https://reviews.llvm.org/D64024	2019-11-07 20:40:32 -06:00
Eli Friedman	98286b569d	[Headers] Fix compatibility between arm_acle.h and intrin.h Make sure they don't both define __nop. Differential Revision: https://reviews.llvm.org/D69012	2019-10-29 14:52:56 -07:00
vhscampos	f6e11a36c4	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250	2019-10-28 11:06:58 +00:00
vhscampos	5d35b7d9e1	[ARM][AArch64] Implement __arm_rsrf, __arm_rsrf64, __arm_wsrf & __arm_wsrf64 Summary: Adding support for ACLE intrinsics. Patch by Michael Platings. Reviewers: chill, t.p.northover, efriedma Reviewed By: chill Subscribers: kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D69297	2019-10-28 10:59:18 +00:00
Greg Bedwell	d4758d4a8d	Fix a spelling mistake in a couple of intrinsic description comments. NFC	2019-10-27 09:42:14 +00:00
Simon Tatham	08074cc965	[clang,ARM] Initial ACLE intrinsics for MVE. This commit sets up the infrastructure for auto-generating <arm_mve.h> and doing clang-side code generation for the builtins it relies on, and demonstrates that it works by implementing a representative sample of the ACLE intrinsics, more or less matching the ones introduced in LLVM IR by D67158,D68699,D68700. Like NEON, that header file will provide a set of vector types like uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON, the ACLE spec for <arm_mve.h> includes a polymorphism system, so that you can write plain vaddq() and disambiguate by the vector types you pass to it. Unlike the corresponding NEON code, I've arranged to make every user- facing ACLE intrinsic into a clang builtin, and implement all the code generation inside clang. So <arm_mve.h> itself contains nothing but typedefs and function declarations, with the latter all using the new `__attribute__((__clang_builtin))` system to arrange that the user- facing function names correspond to the right internal BuiltinIDs. So the new MveEmitter tablegen system specifies the full sequence of IRBuilder operations that each user-facing ACLE intrinsic should translate into. Where possible, the ACLE intrinsics map to standard IR operations such as vector-typed `add` and `fadd`; where no standard representation exists, I call down to the sample IR intrinsics introduced in an earlier commit. Doing it like this means that you get the polymorphism for free just by using __attribute__((overloadable)): the clang overload resolution decides which function declaration is the relevant one, and _then_ its BuiltinID is looked up, so by the time we're doing code generation, that's all been resolved by the standard system. It also means that you get really nice error messages if the user passes the wrong combination of types: clang will show the declarations from the header file and explain why each one doesn't match. (The obvious alternative approach would be to have wrapper functions in <arm_mve.h> which pass their arguments to the underlying builtins. But that doesn't work in the case where one of the arguments has to be a constant integer: the wrapper function can't pass the constantness through. So you'd have to do that case using a macro instead, and then use C11 `_Generic` to handle the polymorphism. Then you have to add horrible workarounds because `_Generic` requires even the untaken branches to type-check successfully, and //then// if the user gets the types wrong, the error message is totally unreadable!) Reviewers: dmgreen, miyuki, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D67161	2019-10-24 16:33:13 +01:00
Craig Topper	282eff3847	[X86] Always define the tzcnt intrinsics even when _MSC_VER is defined. These intrinsics use llvm.cttz intrinsics so are always available even without the bmi feature. We already don't check for the bmi feature on the intrinsics themselves. But we were blocking the include of the header file with _MSC_VER unless BMI was enabled on the command line. Fixes PR30506. llvm-svn: 374516	2019-10-11 06:07:53 +00:00
Pengfei Wang	1f3a15c397	[x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64 Summary: Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64 Reviewers: craig.topper, LuoYuanke, RKSimon, pengfei Reviewed By: RKSimon Subscribers: llvm-commits Patch by yubing (Bing Yu) Differential Revision: https://reviews.llvm.org/D67212 llvm-svn: 372802	2019-09-25 02:24:05 +00:00
Richard Smith	5b2ba5afa9	Fix reliance on -flax-vector-conversions in AVX intrinsics headers and corresponding tests. llvm-svn: 372063	2019-09-17 03:56:30 +00:00
Richard Smith	a50884abad	Remove reliance on lax vector conversions from altivec.h in VSX mode. llvm-svn: 372061	2019-09-17 03:56:26 +00:00
Richard Smith	aeb279dd88	Remove reliance on lax vector conversions from altivec.h and its test. llvm-svn: 371814	2019-09-13 05:19:12 +00:00
Jinsong Ji	5309189d9b	[PowerPC][Altivec] Fix constant argument for vec_dss Summary: This is similar to vec_ct* in https://reviews.llvm.org/rL304205. The argument must be a constant, otherwise instruction selection will fail. always_inline is not enough for isel to always fold everything away at -O0. The fix is to turn the function into macros in altivec.h. Fixes https://bugs.llvm.org/show_bug.cgi?id=43072 Reviewers: nemanjai, hfinkel, #powerpc, wuzish Reviewed By: #powerpc, wuzish Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66699 llvm-svn: 370902	2019-09-04 14:01:47 +00:00
Artem Belevich	ce94ec661f	[CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+ vote.ballot instruction is gone in recent CUDA versions and vote.sync.ballot can not be used because it needs a thread mask parameter. Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32 instruction for this. Differential Revision: https://reviews.llvm.org/D66665 llvm-svn: 370792	2019-09-03 17:31:58 +00:00
Pengfei Wang	dea9cad10e	[x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512 Reviewers: craig.topper, pengfei, LuoYuanke, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Patch by Bing Yu (yubing) Differential Revision: https://reviews.llvm.org/D66786 llvm-svn: 370691	2019-09-03 02:06:15 +00:00
Pengfei Wang	caac097fbf	[x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32 Summary: Adding support for some missing intrinsics: _mm512_cvtsi512_si32 Reviewers: craig.topper, pengfei, LuoYuanke, spatel, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits Patch by Bing Yu (yubing) Differential Revision: https://reviews.llvm.org/D66785 llvm-svn: 370297	2019-08-29 06:18:34 +00:00
Yaxun Liu	af478e240b	[OpenCL] Fix declaration of enqueue_marker Differential Revision: https://reviews.llvm.org/D66512 llvm-svn: 369641	2019-08-22 11:18:59 +00:00
Anastasia Stulova	ef58804ebc	[OpenCL] Fix lang mode predefined macros for C++ mode. In C++ mode we should only avoid adding __OPENCL_C_VERSION__, all other predefined macros about the language mode are still valid. This change also fixes the language version check in the headers accordingly. Differential Revision: https://reviews.llvm.org/D65941 llvm-svn: 368552	2019-08-12 10:44:07 +00:00
Raphael Isemann	c4b5b66a05	[clang] Fixed x86 cpuid NSC signature Summary: The signature "Geode by NSC" for NSC vendor is wrong. In lib/Headers/cpuid.h, signature_NSC_edx and signature_NSC_ecx constants are inverted (cpuid signature order is ebx # edx # ecx). Reviewers: teemperor, rsmith, craig.topper Reviewed By: teemperor, craig.topper Subscribers: craig.topper, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D65978 llvm-svn: 368510	2019-08-10 10:14:01 +00:00
Qiu Chaofan	e9efaf3529	[PowerPC] [Clang] Port SSE3, SSSE3 and SSE4 intrinsics to PowerPC Port existing headers which include x86 intrinsics implementation to PowerPC platform (using Altivec), along with tests. Also, tests about including these intrinsic headers are combined. The headers are mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu. Reviewed By: Jinsong Ji Differential Revision: https://reviews.llvm.org/D65630 llvm-svn: 368392	2019-08-09 03:39:55 +00:00
Momchil Velikov	a36d31478c	[AArch64] Add support for Transactional Memory Extension (TME) Re-commit r366322 after some fixes TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Differential Revision: https://reviews.llvm.org/D64416 Patch by Javed Absar and Momchil Velikov llvm-svn: 367428	2019-07-31 12:52:17 +00:00
Qiu Chaofan	852d444671	[PowerPC] [Clang] Add platform guards to PPC vector intrinsics headers Move the platform check out of PPC Linux toolchain code and add platform guards to the intrinsic headers, since they are supported currently only on 64-bit PowerPC targets. Reviewed By: Jinsong Ji Differential Revision: https://reviews.llvm.org/D64849 llvm-svn: 367281	2019-07-30 02:18:11 +00:00
Paul Robinson	d2c0eefd5c	[X86] Remove const from some intrinsics that shouldn't have them llvm-svn: 366699	2019-07-22 16:14:09 +00:00
Sven van Haastregt	e9e59ad79f	[OpenCL] Define CLK_NULL_EVENT without cast Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?) side-effect that the address space will be fixed (as generic in OpenCL 2.0 mode). The consequence is that any target specific address space for the clk_event_t type will not be applied. It is not clear why the void pointer cast was needed in the first place, and it seems we can do without it. Differential Revision: https://reviews.llvm.org/D63876 llvm-svn: 366546	2019-07-19 09:11:48 +00:00
Qiu Chaofan	03aaef8e72	[PowerPC][Clang] Remove use of malloc in mm_malloc Remove dependency of malloc in implementation of mm_malloc function in PowerPC intrinsics and alignment assumption on glibc. Reviewed By: Hal Finkel Differential Revision: https://reviews.llvm.org/D64850 llvm-svn: 366406	2019-07-18 06:20:12 +00:00
Momchil Velikov	0e2b74a2b0	Revert [AArch64] Add support for Transactional Memory Extension (TME) This reverts r366322 (git commit `4b8da3a503`) llvm-svn: 366355	2019-07-17 17:43:32 +00:00
Momchil Velikov	4b8da3a503	[AArch64] Add support for Transactional Memory Extension (TME) TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Patch by Javed Absar and Momchil Velikov Differential Revision: https://reviews.llvm.org/D64416 llvm-svn: 366322	2019-07-17 13:23:27 +00:00
Kyrylo Tkachov	eb72138340	[AArch64] Implement __jcvt intrinsic from Armv8.3-A The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined. This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function. The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used. I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example). make check-all didn't show any new failures. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics Differential Revision: https://reviews.llvm.org/D64495 llvm-svn: 366197	2019-07-16 09:27:39 +00:00
Ulrich Weigand	b98bf60ef7	[SystemZ] Add support for new cpu architecture - arch13 This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10303. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365933	2019-07-12 18:14:51 +00:00
Craig Topper	caf6b71ab2	[X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64. This is similar to what we do for loadl_pi and loadh_pi. llvm-svn: 365669	2019-07-10 17:11:29 +00:00
Sven van Haastregt	b502a44110	[OpenCL] Restore ATOMIC_VAR_INIT We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros in r363794. Also put the `memory_order` typedef back inside a `>= CL2.0` guard. llvm-svn: 364174	2019-06-24 10:06:40 +00:00
Sven van Haastregt	853dfab799	[OpenCL] Remove more duplicates from opencl-c.h Identified the duplicate declarations using sort lib/Headers/opencl-c.h \| uniq -c \| grep ' 2' llvm-svn: 364173	2019-06-24 10:06:34 +00:00
Anastasia Stulova	999f676d75	[OpenCL][PR41963] Add generic addr space to old atomics in C++ mode Add overloads with generic address space pointer to old atomics. This is currently only added for C++ compilation mode. Differential Revision: https://reviews.llvm.org/D62335 llvm-svn: 364071	2019-06-21 16:19:16 +00:00
Sven van Haastregt	772a7a7680	[OpenCL] Remove duplicate read_image declarations Patch by Pierre Gondois. llvm-svn: 364020	2019-06-21 10:26:10 +00:00
Craig Topper	6d9fb68c53	[X86] Make _mm_mask_cvtps_ph, _mm_maskz_cvtps_ph, _mm256_mask_cvtps_ph, and _mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic. These intrinsics should always take an immediate for the rounding mode. The base instruction comes from before EVEX embdedded rounding. The user should always provide the immediate rather than us assuming CUR_DIRECTION. Make the 512-bit versions also explicit aliases instead of copy pasting the code. llvm-svn: 363961	2019-06-20 18:24:29 +00:00
Xing Xue	ab4bcd844a	AIX system headers need stdint.h and inttypes.h to be re-enterable Summary: AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX. Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF Reviewed by: hubert.reinterpretcast, mclow.lists Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits Tags: #LLVM, #clang, #libc++ Differential Revision: https://reviews.llvm.org/D59253 llvm-svn: 363939	2019-06-20 15:36:32 +00:00
Craig Topper	24151619a0	[X86] Correct the __min_vector_width__ attribute on a few intrinsics. llvm-svn: 363890	2019-06-19 23:27:04 +00:00
Sven van Haastregt	af1c230e70	[OpenCL] Split type and macro definitions into opencl-c-base.h Using the -fdeclare-opencl-builtins option will require a way to predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`, etc. Move these out of opencl-c.h into opencl-c-base.h such that the latter can be shared by -fdeclare-opencl-builtins and -finclude-default-header. This changes the behaviour of -finclude-default-header when -fdeclare-opencl-builtins is specified: instead of including the full header, it will include the header with only the base definitions. Differential revision: https://reviews.llvm.org/D63256 llvm-svn: 363794	2019-06-19 12:48:22 +00:00
Zi Xuan Wu	cc12f68fff	[PowerPC] [Clang] Port SSE2 intrinsics to PowerPC Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec). The new headers containing those implemenations are located into a directory named ppc_wrappers which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu. It's a follow-up patch of D62121. Patched by: Qiu Chaofan <qiucf@cn.ibm.com> Differential Revision: https://reviews.llvm.org/D62569 llvm-svn: 363122	2019-06-12 05:25:40 +00:00
Pengfei Wang	244062eece	[X86] Enable intrinsics that convert float and bf16 data to each other Scalar version : _mm_cvtsbh_ss , _mm_cvtness_sbh Vector version: _mm512_cvtpbh_ps , _mm256_cvtpbh_ps _mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps _mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62363 llvm-svn: 363018	2019-06-11 01:17:28 +00:00
Pengfei Wang	3a29f7c99c	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62282 llvm-svn: 362685	2019-06-06 08:28:42 +00:00
Andrew Savonichev	9ed325e463	[OpenCL] Undefine cl_intel_planar_yuv extension Summary: Remove unnecessary definition (otherwise the extension will be defined where it's not supposed to be defined). Consider the code: #pragma OPENCL EXTENSION cl_intel_planar_yuv : begin // some declarations #pragma OPENCL EXTENSION cl_intel_planar_yuv : end is enough for extension to become known for clang. Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com> Reviewers: Anastasia, yaxunl Reviewed By: Anastasia Tags: #clang Differential Revision: https://reviews.llvm.org/D58666 llvm-svn: 362398	2019-06-03 13:02:43 +00:00
Pengfei Wang	cc3629d545	[X86] Add VP2INTERSECT instructions Support intel AVX512 VP2INTERSECT instructions in clang Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62367 llvm-svn: 362196	2019-05-31 06:09:35 +00:00
Zi Xuan Wu	fc3ed1ec50	re-commit r361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec). The new headers containing those implemenations are located into a directory named ppc_wrappers which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu. Patched by: Qiu Chaofan <qiucf@cn.ibm.com> Reviewed By: Jinsong Ji Differential Revision: https://reviews.llvm.org/D62121 llvm-svn: 362190	2019-05-31 04:42:13 +00:00
Zi Xuan Wu	48061cd999	revert rC361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC Because test fails in other targets rather than PowerPC llvm-svn: 361930	2019-05-29 07:09:54 +00:00
Zi Xuan Wu	b3bcbb5b66	[PowerPC] [Clang] Port SSE intrinsics to PowerPC Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec). The new headers containing those implemenations are located into a directory named ppc_wrappers which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu. Patched by: Qiu Chaofan <qiucf@cn.ibm.com> Reviewed By: Jinsong Ji Differential Revision: https://reviews.llvm.org/D62121 llvm-svn: 361928	2019-05-29 05:17:03 +00:00
Kevin Petit	aa7754cc90	[OpenCL] Add support for the cl_arm_integer_dot_product extensions The specification is available in the Khronos OpenCL registry: https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_integer_dot_product.txt Signed-off-by: Kevin Petit <kevin.petit@arm.com> llvm-svn: 361641	2019-05-24 14:53:52 +00:00
Craig Topper	3d7ecc4618	[X86] Remove semicolons at the end of intrinsics implemented as macros so they can be used as arguments to other intrinsics. Also fix one intrinsic that was using variable names without underscores. Fixes PR41932 llvm-svn: 361109	2019-05-19 01:01:52 +00:00
Gheorghe-Teodor Bercea	144291e14c	[OpenMP][bugfix] Add missing math functions variants for log and abs. Summary: When including the random header in C++, some of the math functions it relies on are not present in the CUDA headers. We include this variants in this case. Reviewers: jdoerfert, hfinkel, tra, caomhin Reviewed By: tra Subscribers: efriedma, guansong, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D62046 llvm-svn: 361066	2019-05-17 19:15:53 +00:00
Craig Topper	20040db9a6	[X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled. Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic. After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic. llvm-svn: 360924	2019-05-16 18:28:17 +00:00
Craig Topper	58964566e0	[X86] Update doxygen comments for AVX512BF16 to not refer to masks as 'immediates'. Refer to parameter names instead of 'src', 'src1', 'src2'. NFC llvm-svn: 360918	2019-05-16 17:34:35 +00:00
Gheorghe-Teodor Bercea	9392bd6987	[OpenMP][Bugfix] Move double and float versions of abs under c++ macro Summary: This is a fix for the reported bug: [[ https://bugs.llvm.org/show_bug.cgi?id=41861 \| 41861 ]] abs functions need to be moved under the c++ macro to avoid conflicts with included headers. Reviewers: tra, jdoerfert, hfinkel, ABataev, caomhin Reviewed By: jdoerfert Subscribers: guansong, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61959 llvm-svn: 360809	2019-05-15 20:28:23 +00:00
Gheorghe-Teodor Bercea	7641f310d7	[OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions Summary: In OpenMP device offloading we must ensure that unde C++ 17, the inclusion of cstdlib will works correctly. Reviewers: ABataev, tra, jdoerfert, hfinkel, caomhin Reviewed By: jdoerfert Subscribers: Hahnfeld, guansong, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61949 llvm-svn: 360804	2019-05-15 20:18:21 +00:00
Volodymyr Sapsai	51e79f0634	[X86] Make `x86intrin.h`, `immintrin.h` includable with `-fno-gnu-inline-asm`. Currently `immintrin.h` includes `pconfigintrin.h` and `sgxintrin.h` which contain inline assembly. It causes failures when building with the flag `-fno-gnu-inline-asm`. Fix by excluding functions with inline assembly when this extension is disabled. So far there was no need to support `_pconfig_u32`, `_enclu_u32`, `_encls_u32`, `_enclv_u32` on platforms that require `-fno-gnu-inline-asm`. But if developers start using these functions, they'll have compile-time undeclared identifier errors which is preferrable to runtime errors. rdar://problem/49540880 Reviewers: craig.topper, GBuella, rnk, echristo Reviewed By: rnk Subscribers: jkorous, dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D61621 llvm-svn: 360630	2019-05-13 22:40:11 +00:00
Gheorghe-Teodor Bercea	946957189d	[OpenMP][Clang][BugFix] Split declares and math functions inclusion. Summary: This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included. Reviewers: jdoerfert, ABataev, hfinkel, caomhin, tra Reviewed By: tra Subscribers: tra, mgorny, guansong, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61765 llvm-svn: 360626	2019-05-13 22:11:44 +00:00
Reid Kleckner	55fab1ff48	Revert Include corecrt.h in stddef.h and vcruntime.h in stdarg.h to improve MS compatibility. This reverts r360271 (git commit `a0933bd8ec`) There are concerns on the review that this breaks EFI builds and that the transitive includes (sal.h) are actually heavy enough that we might care. llvm-svn: 360291	2019-05-08 22:01:20 +00:00
Mike Rice	a0933bd8ec	Include corecrt.h in stddef.h and vcruntime.h in stdarg.h to improve MS compatibility. This allows some applications developed with MSVC to compile with clang without any extra changes. Fixes: llvm.org/PR40789 Differential Revision: https://reviews.llvm.org/D61646 llvm-svn: 360271	2019-05-08 17:15:21 +00:00
Gheorghe-Teodor Bercea	e62c693c8e	[OpenMP][Clang] Support for target math functions Summary: In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang. We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism. Authors: @gtbercea @jdoerfert Reviewers: hfinkel, caomhin, ABataev, tra Reviewed By: hfinkel, ABataev, tra Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert Tags: #clang Differential Revision: https://reviews.llvm.org/D61399 llvm-svn: 360265	2019-05-08 15:52:33 +00:00
Jonas Devlieghere	fe608c938c	Revert "[OpenMP][Clang] Support for target math functions" This commit appears to be breaking stage-2 builds on GreenDragon. The OpenMP wrappers for cmath and math.h are copied into the root of the resource directory and cause a cyclic dependency in module 'Darwin': Darwin -> std -> Darwin. This blows up when CMake is testing for modules support and breaks all stage 2 module builds, including the ThinLTO bot and all LLDB bots. CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message): LLVM_ENABLE_MODULES is not supported by this compiler llvm-svn: 360192	2019-05-07 21:08:15 +00:00
Gheorghe-Teodor Bercea	1e28a668bc	[OpenMP][Clang] Support for target math functions Summary: In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang. We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism. Authors: @gtbercea @jdoerfert Reviewers: hfinkel, caomhin, ABataev, tra Reviewed By: hfinkel, ABataev, tra Subscribers: mgorny, guansong, cfe-commits, jdoerfert Tags: #clang Differential Revision: https://reviews.llvm.org/D61399 llvm-svn: 360063	2019-05-06 18:19:15 +00:00
Fangrui Song	041c377a59	[X86] Move files to correct directories after D60552 llvm-svn: 360022	2019-05-06 09:24:36 +00:00
Luo, Yuanke	844f662932	Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Patch by LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon Reviewed By: craig.topper Subscribers: mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60552 llvm-svn: 360018	2019-05-06 08:25:11 +00:00
Tom Stellard	dbe1c4aa6f	lib/Header: Fix Visual Studio builds try #2 Summary: This is a follow up to r355253 and a better fix than the first attempt which was r359257. We can't install anything from ${CMAKE_CFG_INTDIR}, because this value is only defined at build time, but we still must make sure to copy the headers into ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include, because the lit tests look for headers there. So for this fix we revert to the old behavior of copying the headers to ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include during the build and then installing them from the source tree. Reviewers: smeenai, vzakhari, phosek Reviewed By: smeenai, vzakhari Subscribers: mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61220 llvm-svn: 359654	2019-05-01 06:18:03 +00:00
Javed Absar	18b0c40bc5	[AArch64] Add support for MTE intrinsics This provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined. Each intrinsic is described in detail in the ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed By: Tim Nortover, David Spickett Differential Revision: https://reviews.llvm.org/D60485 llvm-svn: 359348	2019-04-26 21:08:11 +00:00
Tom Stellard	0184819e81	Revert lib/Header: Fix Visual Studio builds This reverts r359257 (git commit `00d9789509`) This broke check-clang. llvm-svn: 359258	2019-04-26 01:43:59 +00:00
Tom Stellard	00d9789509	lib/Header: Fix Visual Studio builds Summary: This is a follow up to r355253, which inadvertently broke Visual Studio builds by trying to copy files from CMAKE_CFG_INTDIR. See https://reviews.llvm.org/D58537#inline-532492 Reviewers: smeenai, vzakhari, phosek Reviewed By: smeenai Subscribers: mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61054 llvm-svn: 359257	2019-04-26 01:18:59 +00:00
Jinsong Ji	12450d51a2	[PowerPC][NFC]Update licence to Apache 2 llvm-svn: 359164	2019-04-25 02:40:06 +00:00
Qiu Chaofan	19828e399b	[PowerPC] [Clang] Port MMX intrinsics and basic test cases to Power Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec). To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added. The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu. Reviewed By: Jinsong Ji Differential Revision: https://reviews.llvm.org/D59924 llvm-svn: 358949	2019-04-23 05:50:24 +00:00
Evgeny Mankov	88aa3d7237	[CUDA][Windows] Restrict long double device functions declarations to Windows As agreed in D60220, make long double declarations unobservable on non-windows platforms. [Testing] {Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.11 & 2019 16.0.1, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1} Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D60818 llvm-svn: 358654	2019-04-18 10:08:55 +00:00
Craig Topper	8e364c680f	[X86] Restore the pavg intrinsics. The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427	2019-04-15 17:17:35 +00:00
Chandler Carruth	4cf5743b77	Move the builtin headers to use the new license file header. Summary: These all had somewhat custom file headers with different text from the ones I searched for previously, and so I missed them. Thanks to Hal and Kristina and others who prompted me to fix this, and sorry it took so long. Reviewers: hfinkel Subscribers: mcrosier, javed.absar, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60406 llvm-svn: 357941	2019-04-08 20:51:30 +00:00
Evgeny Mankov	66a8b07cd9	[CUDA][Windows] Last fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows" (https://bugs.llvm.org/show_bug.cgi?id=38811 ). [IMPORTANT] With that last fix, CUDA has just started being compiling by clang on Windows after nearly a year and two clangâ€™s major releases (7 and 8). As long as the last LLVM release, in which clang was compiling CUDA on Windows successfully, was 6.0.1, this fix and two previous have to be included into upcoming 7.1.0 and 8.0.1 releases. [How to repro] clang++.exe -x cuda "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\0_Simple\simplePrintf\simplePrintf.cu" -I"c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\common\inc" --cuda-gpu-arch=sm_50 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0" -L"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64" -lcudart.lib -v [Output] In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:390:11: error: no matching function for call to '__isinfl' return (__isinfl(a) != 0); ^~~~~~~~ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2662:14: note: candidate function not viable: call to __host__ function from __device__ function __func__(int __isinfl(long double a)) ^ In file included from <built-in>:1: In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:438:11: error: no matching function for call to '__isnanl' return (__isnanl(a) != 0); ^~~~~~~~ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2672:14: note: candidate function not viable: call to __host__ function from __device__ function __func__(int __isnanl(long double a)) ^ In file included from <built-in>:1: In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:486:11: error: no matching function for call to '__finitel' return (__finitel(a) != 0); ^~~~~~~~~ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2652:14: note: candidate function not viable: call to __host__ function from __device__ function __func__(int __finitel(long double a)) ^ 3 errors generated when compiling for sm_50. [Solution] Add missing long double device functions' declarations. Provide only declarations to prevent any use of long double on the device side, because CUDA does not support long double on the device side. [Testing] {Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.9, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1} Reviewed by: Artem Belevich Differential Revision: http://reviews.llvm.org/D60220 llvm-svn: 357779	2019-04-05 16:51:10 +00:00
Craig Topper	6af0363857	[X86] Make _bswap intrinsic a function instead of a macro to hopefully fix the chromium build. This intrinsic was added in r356848 but was implemented as a macro to match gcc. llvm-svn: 356862	2019-03-24 18:00:20 +00:00
Craig Topper	88f4054f48	[X86] Add BSR/BSF/BSWAP intrinsics to ia32intrin.h to match gcc. Summary: These are all implemented by icc as well. I made bit_scan_forward/reverse forward to the __bsfd/__bsrq since we also have __bsfq/__bsrq. Note, when lzcnt is enabled the bsr intrinsics generates lzcnt+xor instead of bsr. Reviewers: RKSimon, spatel Subscribers: cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D59682 llvm-svn: 356848	2019-03-24 00:56:52 +00:00
Craig Topper	1383340422	[X86] Add __popcntd and __popcntq to ia32intrin.h to match gcc and icc. Remove popcnt feature flag from _popcnt32/_popcnt64 and move to ia32intrin.h to match gcc gcc and icc both implement popcntd and popcntq which we did not. gcc doesn't seem to require a feature flag for the _popcnt32/_popcnt64 spelling and will use a libcall if its not supported. Differential Revision: https://reviews.llvm.org/D59567 llvm-svn: 356689	2019-03-21 17:43:53 +00:00
Craig Topper	e0941cb326	[X86] Add __crc32b/__crc32w/__crc32d/__crc32q intrinsics to match gcc and icc. gcc has these intrinsics in ia32intrin.h as well. And icc implements them though they aren't documented in the Intel Intrinsics Guide. Differential Revision: https://reviews.llvm.org/D59533 llvm-svn: 356609	2019-03-20 20:25:28 +00:00
Craig Topper	8b653d0308	[X86] Add gcc rotate intrinsics to ia32intrin.h This is another attempt at what Erich Keane tried to do in r355322. This adds rolb, rolw, rold, rolq and their ror equivalent as always_inline wrappers around __builtin_rotate* which will lower to funnel shift intrinsics in IR. Additionally, when _MSC_VER is not defined we will define _rotl, _lrotl, _rotr, _lrotr as macros to one of the always_inline intrinsics mentioned above. Making sure that _lrotl/_lrotr use either 32 or 64 bit based on the size of long. These need to be macros because we have builtins with the same name for MS compatibility, but _MSC_VER isn't always defined when those builtins are enabled. We also define _rotwl and _rotwr as macros aliasing to rolw/rorw just like gcc to complete the set. These don't need to be gated with _MSC_VER because these aren't MS builtins. I've added tests both for non-MS and -ms-extensions with and without _MSC_VER being defined. Differential Revision: https://reviews.llvm.org/D59346 llvm-svn: 356423	2019-03-18 22:25:57 +00:00
Evgeny Mankov	177301f048	[CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3) Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows". [Synopsis] __sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017). [Solution] Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows. In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162: C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression return __nv_fast_sincosf(__a, __sptr, __cptr); ^ Reviewed by: Artem Belevich Differential Revision: http://reviews.llvm.org/D59423 llvm-svn: 356291	2019-03-15 19:04:46 +00:00
Evgeny Mankov	04188fc0c6	[CUDA][Windows] Partial fix for bug #38811 (Step 1 of 3) Partial fix for the clang Bug https://bugs.llvm.org/show_bug.cgi?id=38811 "Clang fails to compile with CUDA-9.x on Windows". Adding defined(_WIN64) check along with existing #if defined(__LP64__) eliminates the below clang (64-bit) compilation error on Windows. C:/GIT/LLVM/trunk/llvm-64-release-vs2017/dist/lib/clang/9.0.0\include\__clang_cuda_device_functions.h(1609,45): error GEF7559A7: no matching function for call to 'roundf' __DEVICE__ long lroundf(float __a) { return roundf(__a); } Reviewed by: Artem Belevich Differential Revision: http://reviews.llvm.org/D59361 llvm-svn: 356255	2019-03-15 12:05:36 +00:00
Shoaib Meenai	5be71faf4b	[build] Rename clang-headers to clang-resource-headers Summary: The current install-clang-headers target installs clang's resource directory headers. This is different from the install-llvm-headers target, which installs LLVM's API headers. We want to introduce the corresponding target to clang, and the natural name for that new target would be install-clang-headers. Rename the existing target to install-clang-resource-headers to free up the install-clang-headers name for the new target, following the discussion on cfe-dev [1]. I didn't find any bots on zorg referencing install-clang-headers. I'll send out another PSA to cfe-dev to accompany this rename. [1] http://lists.llvm.org/pipermail/cfe-dev/2019-February/061365.html Reviewers: beanz, phosek, tstellar, rnk, dim, serge-sans-paille Subscribers: mgorny, javed.absar, jdoerfert, #sanitizers, openmp-commits, lldb-commits, cfe-commits, llvm-commits Tags: #clang, #sanitizers, #lldb, #openmp, #llvm Differential Revision: https://reviews.llvm.org/D58791 llvm-svn: 355340	2019-03-04 21:19:53 +00:00
Tom Stellard	4f076000c6	lib/Header: Simplify CMakeLists.txt Summary: Replace cut and pasted code with cmake macros and reduce the number of install commands. This fixes an issue where the headers were being installed twice. This clean up should also make future modifications easier, like adding a cmake option to install header files into a custom resource directory. Reviewers: chandlerc, smeenai, mgorny, beanz, phosek Reviewed By: smeenai Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58537 llvm-svn: 355253	2019-03-02 00:50:13 +00:00
Louis Dionne	c2d95792d6	[clang] Only provide C11 features in <float.h> starting with C++17 Summary: In r353970, I enabled those features in C++11 and above. To be strictly conforming, those features should only be enabled in C++17 and above. Reviewers: jfb, eli.friedman Subscribers: jkorous, dexonsmith, libcxx-commits Differential Revision: https://reviews.llvm.org/D58289 llvm-svn: 354691	2019-02-22 20:48:54 +00:00
Shoaib Meenai	defb5a383b	[clang] Switch to LLVM_ENABLE_IDE r344555 switched LLVM to guarding install targets with LLVM_ENABLE_IDE instead of CMAKE_CONFIGURATION_TYPES, which expresses the intent more directly and can be overridden by a user. Make the corresponding change in clang. LLVM_ENABLE_IDE is computed by HandleLLVMOptions, so it should be available for both standalone and integrated builds. Differential Revision: https://reviews.llvm.org/D58284 llvm-svn: 354525	2019-02-20 23:08:43 +00:00
Louis Dionne	defa9f8f85	[clang] Make sure C99/C11 features in <float.h> are provided in C++11 Summary: Previously, those #defines were only provided in C or when GNU extensions were enabled. We need those #defines in C++11 and above, too. Reviewers: jfb, eli.friedman Subscribers: jkorous, dexonsmith, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58149 llvm-svn: 353970	2019-02-13 19:08:01 +00:00
Simon Atanasyan	4c22a57414	[Headers][mips] Add `__attribute__((__mode__(__unwind_word__)))` to the _Unwind_Word / _Unwind_SWord definitions The rationale of this change is to fix _Unwind_Word / _Unwind_SWord definitions for MIPS N32 ABI. This ABI uses 32-bit pointers, but _Unwind_Word and _Unwind_SWord types are eight bytes long. # The __attribute__((__mode__(__unwind_word__))) is added to the type definitions. It makes them equal to the corresponding definitions used by GCC and allows to override types using `getUnwindWordWidth` function. # The `getUnwindWordWidth` virtual function override in the `MipsTargetInfo` class and provides correct type size values. Differential revision: https://reviews.llvm.org/D58165 llvm-svn: 353965	2019-02-13 18:27:09 +00:00
Reid Kleckner	79d7f4114d	[X86] Use __m128_u for _mm_loadu_ps after r353555 Add secondary triple to existing SSE test for it. I audited other uses of __attribute__((__packed__)) in the intrinsic headers, and this seemed to be the only missing one. llvm-svn: 353878	2019-02-12 21:04:21 +00:00
Craig Topper	4390c721cb	[X86] Use the new unaligned vector typedefs for the loadu/storeu intrinsics pointer arguments. This matches what gcc does and what was suggested by rnk in PR20670. llvm-svn: 353802	2019-02-12 07:44:40 +00:00
Tom Tan	42b2424e4f	[COFF, ARM64] Remove definitions for _byteswap library functions _byteswap_* functions are are implemented in below file as normal function from libucrt.lib and declared in stdlib.h. Define them in intrin.h triggers lld error "conflicting comdat type" and "duplicate symbols" which was just added to LLD (https://reviews.llvm.org/D57324). C:\Program Files (x86)\Windows Kits\10\Source\10.0.17763.0\ucrt\stdlib\byteswap.cpp Differential Revision: https://reviews.llvm.org/D57915 llvm-svn: 353740	2019-02-11 20:04:02 +00:00
Craig Topper	be4cbe8726	[X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow matching of MSVC behavior with #pragma pack. Summary: With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows. It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here. This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER. I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows. Reviewers: rnk, erichkeane, spatel, RKSimon Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D57961 llvm-svn: 353555	2019-02-08 19:45:08 +00:00
Eli Friedman	3189d5f48c	[COFF, ARM64] Fix types for _ReadStatusReg, _WriteStatusReg r344765 added those intrinsics, but used the wrong types. Patch by Mike Hommey Differential Revision: https://reviews.llvm.org/D57636 llvm-svn: 353493	2019-02-08 01:17:49 +00:00
Artem Belevich	4071763bb8	Basic CUDA-10 support. Differential Revision: https://reviews.llvm.org/D57771 llvm-svn: 353232	2019-02-05 22:38:58 +00:00
Artem Belevich	c62214da3d	[CUDA] add support for the new kernel launch API in CUDA-9.2+. Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799	2019-01-31 21:34:03 +00:00
Matt Arsenault	58fc8082a8	OpenCL: Use length modifier for warning on vector printf arguments Re-enable format string warnings on printf. The warnings are still incomplete. Apparently it is undefined to use a vector specifier without a length modifier, which is not currently warned on. Additionally, type warnings appear to not be working with the hh modifier, and aren't warning on all of the special restrictions from c99 printf. llvm-svn: 352540	2019-01-29 20:49:54 +00:00
Matt Arsenault	297afb14ec	Revert "OpenCL: Extend argument promotion rules to vector types" This reverts r348083. This was based on a misreading of the spec for printf specifiers. Also revert r343653, as without a subsequent patch, a correctly specified format for a vector will incorrectly warn. Fixes bug 40491. llvm-svn: 352539	2019-01-29 20:49:47 +00:00
Craig Topper	8de5abc4c8	[X86] Remove mask and passthru arguments from vpconflict builtins. Use select in IR instead. llvm-svn: 352173	2019-01-25 07:08:22 +00:00
Craig Topper	9fddc3fd00	[X86] Remove the cvtuqq2ps256/cvtqq2ps256 mask builtins. Replace with uitofp/sitofp and select. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D56965 llvm-svn: 351694	2019-01-20 19:04:56 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Craig Topper	d08d90ce51	[X86] Only define _XCR_XFEATURE_ENABLED_MASK in xsaveintrin.h when _MSC_VER is defined. Remove from intrin.h. I think this was my intention when I added it xsaveintrin.h llvm-svn: 351568	2019-01-18 17:51:51 +00:00
Craig Topper	931779761e	Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms" V8 has been fixed now. llvm-svn: 351391	2019-01-16 22:56:25 +00:00
Benjamin Kramer	9c53890833	Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms" This reverts commit r351160. Breaks building v8. llvm-svn: 351210	2019-01-15 17:23:36 +00:00
Roman Lebedev	cce1c2eb0e	[OpenCL] opencl-c.h: read_image(): sampler-less, and image{1,2}d_array_t variants are OpenCL-1.2+, mark them as such Summary: Refer to [[ https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=242 \| `6.11.13.2 Built-in Image Functions` ]], and [[ https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=306 \| `9.6.8 Image Read and Write Functions` ]] of the OpenCL 1.1 spec. There is no mention of `image1d_array_t` and `image2d_array_t` anywhere in the OpenCL 1.1 spec. * All the `read_image{f,i,ui,h}()` functions, as of OpenCL 1.1 spec, have a second required parameter `sampler_t sampler` Should have prevented the following regression: https://redmine.darktable.org/issues/12493 Reviewers: yaxunl, Anastasia, echuraev, asavonic Reviewed By: Anastasia Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D56646 llvm-svn: 351188	2019-01-15 11:20:02 +00:00
Craig Topper	69aed7c364	[X86] Make _xgetbv/_xsetbv on non-windows platforms Summary: This patch attempts to redo what was tried in r278783, but was reverted. These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves. To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name. I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin. I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it. Reviewers: rnk, RKSimon, spatel Reviewed By: rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56686 llvm-svn: 351160	2019-01-15 05:03:18 +00:00
Mandeep Singh Grang	1b9337f16b	[COFF, ARM64] Add __byteswap intrinsics Reviewers: rnk, efriedma, ssijaric, TomTan, haripul Reviewed By: efriedma Subscribers: javed.absar, cfe-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D56685 llvm-svn: 351147	2019-01-15 01:26:26 +00:00
Mandeep Singh Grang	0055f80b3f	[COFF, ARM64] Add __nop intrinsic Reviewers: rnk, efriedma, TomTan, haripul, ssijaric Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D56671 llvm-svn: 351135	2019-01-14 23:26:01 +00:00
Craig Topper	49488407aa	[X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead. Fixes PR40259 llvm-svn: 351036	2019-01-14 08:46:51 +00:00
Craig Topper	bdbe5c7dc7	[X86] Make the pointer arguments to avx512 gather/scatter intrinsics 'void' to match gcc and Intel's documentation. The avx2 gather intrinsics are documented to use 'int', 'long long', 'float', or 'double' . So I'm leaving those. This matches gcc. llvm-svn: 350696	2019-01-09 07:36:01 +00:00
Craig Topper	cd9e232a4d	Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." The MSVC limit hit in AutoUpgrade.cpp has been worked around for now. llvm-svn: 350568	2019-01-07 21:00:41 +00:00
Craig Topper	33c9088783	Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp llvm-svn: 350563	2019-01-07 19:39:25 +00:00
Craig Topper	e34f2bb807	[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins. Differential Revision: https://reviews.llvm.org/D56365 llvm-svn: 350555	2019-01-07 19:10:22 +00:00
Ulrich Weigand	22ca9c628a	[SystemZ] Fix wrong codegen caused by typos in vecintrin.h The following two bugs in SystemZ high-level vector intrinsics are fixes by this patch: - The float case of vec_insert_and_zero should generate a VLLEZF pattern, but currently erroneously generates VLLEZLF. - The float and double versions of vec_orc erroneously generate and-with-complement instead of or-with-complement. The patch also fixes a couple of typos in the associated test. llvm-svn: 349751	2018-12-20 13:09:09 +00:00
Craig Topper	1f2b181689	[Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature. For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops. Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled. This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available. Should fix PR40014 Differential Revision: https://reviews.llvm.org/D55677 llvm-svn: 349098	2018-12-14 00:21:02 +00:00
Craig Topper	6d7a7ef9eb	[X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc. The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature. This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden. llvm-svn: 348738	2018-12-10 06:07:59 +00:00
Artem Belevich	43cfce88b4	[CUDA] Added missing 'inline' for functions defined in a header. llvm-svn: 348662	2018-12-07 22:20:53 +00:00
Stefan Granitz	5c1ec2e9cb	[CMake] Store path to vendor-specific headers in clang-headers target property Summary: LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files: ``` get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY) file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*") ``` By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`. Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith Reviewed By: JDevlieghere Subscribers: mgorny, #lldb, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D55128 llvm-svn: 348116	2018-12-03 10:34:25 +00:00
Nemanja Ivanovic	2447baff84	[PowerPC] Vector load/store builtins overstate alignment of pointers A number of builtins in altivec.h load/store vectors from pointers to scalar types. Currently they just cast the pointer to a vector pointer, but expressions like that have the alignment of the target type. Of course, the input pointer did not have that alignment so this triggers UBSan (and rightly so). This resolves https://bugs.llvm.org/show_bug.cgi?id=39704 Differential revision: https://reviews.llvm.org/D54787 llvm-svn: 347556	2018-11-26 14:35:38 +00:00
Zi Xuan Wu	71c35e13c3	[PowerPC] [Clang] [AltiVec] The second parameter of vec_sr function should be modulo the number of bits in the element The second parameter of vec_sr function is representing shift bits and it should be modulo the number of bits in the element like what vec_sl does now. This is actually required by the ABI: Each element of the result vector is the result of logically right shifting the corresponding element of ARG1 by the number of bits specified by the value of the corresponding element of ARG2, modulo the number of bits in the element. The bits that are shifted out are replaced by zeros. Differential Revision: https://reviews.llvm.org/D54087 llvm-svn: 346471	2018-11-09 03:35:32 +00:00
Andrew Savonichev	3fee351867	[OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension Summary: Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt Patch by Kristina Bessonova Reviewers: Anastasia, yaxunl, shafik Reviewed By: Anastasia Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits Differential Revision: https://reviews.llvm.org/D51484 llvm-svn: 346392	2018-11-08 11:25:41 +00:00
Andrew Savonichev	3b12b7e702	Revert r346326 [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation This patch breaks Index/opencl-types.cl LIT test: Script: -- : 'RUN: at line 1'; stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 \| stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl -- Command Output (stderr): -- llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm-svn: 346338	2018-11-07 18:34:19 +00:00
Andrew Savonichev	35dfce723c	[OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension Summary: Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt Patch by Kristina Bessonova Reviewers: Anastasia, yaxunl, shafik Reviewed By: Anastasia Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits Differential Revision: https://reviews.llvm.org/D51484 llvm-svn: 346326	2018-11-07 15:44:01 +00:00
Reid Kleckner	902fa41188	[MS] Zero out ECX in __cpuid in intrin.h Summary: Some CPUID leafs depend on the value of ECX as well as EAX, but we left it uninitialized. Originally reported as https://crbug.com/901547 Reviewers: craig.topper, hans Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D54171 llvm-svn: 346265	2018-11-06 20:45:26 +00:00
Mandeep Singh Grang	574caddc0d	[COFF, ARM64] Implement InterlockedDecrement_ builtins This is eight in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54068 llvm-svn: 346208	2018-11-06 05:07:43 +00:00
Mandeep Singh Grang	fdf74d9751	[COFF, ARM64] Implement InterlockedIncrement_ builtins This is seventh in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54067 llvm-svn: 346207	2018-11-06 05:05:32 +00:00
Mandeep Singh Grang	c89157b5c1	[COFF, ARM64] Implement InterlockedAnd_ builtins This is sixth in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54066 llvm-svn: 346206	2018-11-06 05:03:13 +00:00
Mandeep Singh Grang	806f10701b	[COFF, ARM64] Implement InterlockedXor_ builtins This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Note: This was reviewed and approved in D54065 but somehow that diff was messed up. Committing this again with the proper diff. llvm-svn: 346205	2018-11-06 04:55:20 +00:00
Mandeep Singh Grang	d9f70b1495	Revert "[COFF, ARM64] Implement InterlockedXor_ builtins" This reverts commit cc3d3cd0fbeb88412d332354c261ff139c4ede6b. llvm-svn: 346192	2018-11-06 01:14:24 +00:00
Mandeep Singh Grang	d8a4455d97	[COFF, ARM64] Implement InterlockedXor_ builtins Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54065 llvm-svn: 346191	2018-11-06 01:12:29 +00:00
Mandeep Singh Grang	ec62b31e2c	[COFF, ARM64] Implement InterlockedOr_ builtins This is fourth in a series of patches to move intrinsic definitions out of intrin.h. llvm-svn: 346190	2018-11-06 01:11:25 +00:00
Mandeep Singh Grang	6b880689f0	[COFF, ARM64] Implement InterlockedCompareExchange_ builtins Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54062 llvm-svn: 346189	2018-11-06 00:36:48 +00:00
Mandeep Singh Grang	7fa07e554d	[COFF, ARM64] Implement InterlockedExchange_ builtins Summary: Windows SDK needs these intrinsics to be proper builtins. This is second in a series of patches to move intrinsic defintions out of intrin.h. Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54046 llvm-svn: 346044	2018-11-02 21:18:23 +00:00
Eli Friedman	b262d1631e	[ARM64] [Windows] Implement _InterlockedExchangeAdd_ builtins. These apparently need to be proper builtins to handle the Windows SDK. Differential Revision: https://reviews.llvm.org/D53916 llvm-svn: 345779	2018-10-31 21:31:09 +00:00
Andrew Savonichev	70b6bbe2bb	[OpenCL] Remove PIPE_RESERVE_ID_VALID_BIT from opencl-c.h Summary: PIPE_RESERVE_ID_VALID_BIT is implementation defined, so lets not keep it in the header. Previously the topic was discussed here: https://reviews.llvm.org/D32896 Reviewers: Anastasia, yaxunl Reviewed By: Anastasia Subscribers: cfe-commits, asavonic, bader Differential Revision: https://reviews.llvm.org/D52658 llvm-svn: 345051	2018-10-23 17:05:29 +00:00
Andrew Savonichev	700c3bea9e	[OpenCL] Add cl_intel_planar_yuv extension Just adding a preprocessor #define for the extension. Patch by Alexey Sotkin and Dmitry Sidorov Phabricator review: https://reviews.llvm.org/D51402 llvm-svn: 345044	2018-10-23 16:13:16 +00:00
Craig Topper	eae26bf737	[X86] Add more intrinsics to match icc. This adds _mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8 _mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16 _mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8 _mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16 llvm-svn: 344862	2018-10-20 19:28:52 +00:00
Craig Topper	58508be3c0	[X86] Add missing intrinsics to match icc. This adds _mm_and_epi32, _mm_and_epi64 _mm_andnot_epi32, _mm_andnot_epi64 _mm_or_epi32, _mm_or_epi64 _mm_xor_epi32, _mm_xor_epi64 _mm256_and_epi32, _mm256_and_epi64 _mm256_andnot_epi32, _mm256_andnot_epi64 _mm256_or_epi32, _mm256_or_epi64 _mm256_xor_epi32, _mm256_xor_epi64 _mm_loadu_epi32, _mm_loadu_epi64 _mm_load_epi32, _mm_load_epi64 _mm256_loadu_epi32, _mm256_loadu_epi64 _mm256_load_epi32, _mm256_load_epi64 _mm512_loadu_epi32, _mm512_loadu_epi64 _mm512_load_epi32, _mm512_load_epi64 _mm_storeu_epi32, _mm_storeu_epi64 _mm_store_epi32, _mm_load_epi64 _mm256_storeu_epi32, _mm256_storeu_epi64 _mm256_store_epi32, _mm256_load_epi64 _mm512_storeu_epi32, _mm512_storeu_epi64 _mm512_store_epi32,V _mm512_load_epi64 llvm-svn: 344861	2018-10-20 19:28:50 +00:00
Mandeep Singh Grang	2147b1af95	[COFF, ARM64] Add _ReadStatusReg and_WriteStatusReg intrinsics Reviewers: rnk, compnerd, mstorsjo, efriedma, TomTan, haripul, javed.absar Reviewed By: efriedma Subscribers: dmajor, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D53115 llvm-svn: 344765	2018-10-18 23:35:35 +00:00
Mandeep Singh Grang	df7929676d	[COFF, ARM64] Add _InterlockedAdd intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D52811 llvm-svn: 343894	2018-10-05 21:57:41 +00:00
Mandeep Singh Grang	ecc82ef0c2	[COFF, ARM64] Add __getReg intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: peter.smith, efriedma, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D52838 llvm-svn: 343824	2018-10-04 22:32:42 +00:00
Matt Arsenault	a01151294a	OpenCL: Mark printf format string argument Fixes not warning on format string errors. llvm-svn: 343653	2018-10-03 02:01:19 +00:00
Craig Topper	716e8e6858	[X86] Add more of the icc unaligned load/store to/from 128 bit vector intrinsics Summary: This patch adds _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 We already had _mm_load_si64. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D52665 llvm-svn: 343388	2018-09-29 17:49:42 +00:00
Craig Topper	6ad9220067	[X86] Add the movbe instruction intrinsics from icc. These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website. All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics Differential Revision: https://reviews.llvm.org/D52586 llvm-svn: 343343	2018-09-28 17:09:51 +00:00
Craig Topper	fb5d9f2849	[X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false. Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction. By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction. Differential Revision: https://reviews.llvm.org/D52392 llvm-svn: 343126	2018-09-26 17:01:44 +00:00
Artem Belevich	44ecb0e3c2	[CUDA] Added basic support for compiling with CUDA-10.0 llvm-svn: 342924	2018-09-24 23:10:44 +00:00
Craig Topper	d88f76a891	[X86] Add ktest intrinsics to match gcc and icc. These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc. Includes these intrinsics: _ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8 _ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8 _ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8 _ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8 llvm-svn: 341265	2018-08-31 22:29:56 +00:00
Craig Topper	42a4d0822e	[X86] Add k-mask conversion and load/store instrinsics to match gcc and icc. This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. llvm-svn: 341251	2018-08-31 20:41:06 +00:00
Craig Topper	2aa8efc820	[X86] Add kshift intrinsics to match gcc and icc. This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 llvm-svn: 341234	2018-08-31 18:22:52 +00:00
Craig Topper	a65bf65e0b	[X86] Add kadd intrinsics to match gcc and icc. This adds the following intrinsics: _kadd_mask64 _kadd_mask32 _kadd_mask16 _kadd_mask8 These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc. llvm-svn: 340879	2018-08-28 22:32:14 +00:00
Craig Topper	cb5fd56c7f	[X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798	2018-08-28 06:28:25 +00:00
Craig Topper	c330ca8611	[X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719	2018-08-27 06:20:22 +00:00
Craig Topper	9a022280b5	[X86] Remove min_vector_width 512 from some intrinsics that operate only on k-registers. llvm-svn: 340718	2018-08-27 06:20:20 +00:00
Craig Topper	e0b5d4cd9d	[X86] Rename __DEFAULT_FN_ATTRS to a__DEFAULT_FN_ATTRS512 in avx512dqintrin.h and avx512bwintrin.h. This is preparation for adding removing min_vector_width 512 from some intrinsics. llvm-svn: 340717	2018-08-27 06:20:19 +00:00
Craig Topper	266b858705	[X86] Undef __DEFAULT_FN_ATTRS in avx512fintrin.h. Fixes test failure after r340713 llvm-svn: 340714	2018-08-27 05:44:45 +00:00
Craig Topper	f821f5314e	[X86] Don't set min_vector_width to 512 on intrinsics that only operate on k registers. llvm-svn: 340713	2018-08-27 05:27:15 +00:00
Nico Weber	b2c53d3393	Make __shiftleft128 / __shiftright128 real compiler built-ins. r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 llvm-svn: 340048	2018-08-17 17:19:06 +00:00
Craig Topper	72a7606433	[X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead. llvm-svn: 339845	2018-08-16 07:28:06 +00:00
Craig Topper	0609d1e211	[X86] Remove masking from the 512-bit padds and psubs builtins. Use select builtin instead. llvm-svn: 339843	2018-08-16 06:20:29 +00:00
Pirama Arumuga Nainar	3c1a7bc290	[Headers] Define _HAS_SUBNORM for FLT, DBL, LDBL Summary: These macros are defined in the C11 standard and can be defined based on the ___HAS_DENORM__ default macros. Reviewers: bruno, rsmith, doug.gregor Subscribers: llvm-commits, enh, srhines Differential Revision: https://reviews.llvm.org/D37302 llvm-svn: 339284	2018-08-08 20:38:38 +00:00

... 3 4 5 6 7 ...

1875 Commits