llvm-project

Commit Graph

Author	SHA1	Message	Date
Thomas Lively	fd3bd63df2	[WebAssembly] Make bitmask instructions return unsigned ints Since they are bitmasks, it will be more common for them to be used and potentially extended to 64-bit integers as unsigned values rather than signed values. Differential Revision: https://reviews.llvm.org/D108401	2021-08-19 16:23:47 -07:00
Martin Storsjö	cc3affd8b0	[clang] [MSVC] Implement __mulh and __umulh builtins for aarch64 The code is based on the same __mulh and __umulh intrinsics for x86. This should fix PR51128. Differential Revision: https://reviews.llvm.org/D106721	2021-08-19 11:29:55 +03:00
Jon Chesterfield	dbd7bad9ad	[openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971	2021-08-19 02:22:11 +01:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Craig Topper	705b1191aa	[X86] Add parentheses around casts in X86 intrinsic headers. Fixes PR51324.	2021-08-14 18:14:44 -07:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
Craig Topper	d2cb189184	[X86] Use a do {} while (0) in the _MM_EXTRACT_FLOAT implementation. Previously we just used {}, but that doesn't work in situations like this. if (1) _MM_EXTRACT_FLOAT(d, x, n); else ... The semicolon would terminate the if.	2021-08-14 16:41:55 -07:00
Craig Topper	73c4c32767	[X86] Use __builtin_bit_cast _mm_extract_ps instead of type punning through a union. NFC	2021-08-14 16:35:55 -07:00
Craig Topper	4190d99dfc	[X86] Add parentheses around casts in some of the X86 intrinsic headers. This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros due to rounding mode. Fixes part of PR51324. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107843	2021-08-13 09:36:16 -07:00
Jon Chesterfield	6a8e5120ab	Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc" This reverts commit `b6113548c9`.	2021-08-12 17:44:36 +01:00
Jon Chesterfield	b6113548c9	[openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971	2021-08-12 17:30:22 +01:00
Freddy Ye	6c1468854d	[X86] Reverse _set_ph and _setr_ph 's set order. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107946	2021-08-12 16:27:04 +08:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Dave Airlie	1854db74c5	opencl-c.h: add 3.0 optional extension support for a few more bits These 3 are fairly simple, pipes, workgroups and subgroups. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D105858	2021-08-07 09:25:00 +10:00
Justas Janickas	a5a2f05dcc	[C++4OpenCL] Introduces __remove_address_space utility This change provides a way to conveniently declare types that have address space qualifiers removed. Since OpenCL adds address spaces implicitly even when they are not specified in source, it is useful to allow deriving address space unqualified types. Fixes llvm.org/PR45326 Differential Revision: https://reviews.llvm.org/D106785	2021-08-06 10:40:22 +01:00
Jon Chesterfield	509854b69c	[clang] Replace asm with __asm__ in cuda header Asm is a gnu extension for C, so at present -fopenmp -std=c99 and similar fail to compile on nvptx, bug 51344 Changing to `__asm__` or `__asm` works for openmp, all three appear to work for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different syntax, so this should make for better error diagnostics if the header is passed to a compiler other than clang. Reviewed By: tra, emankov Differential Revision: https://reviews.llvm.org/D107492	2021-08-05 18:46:57 +01:00
Dave Airlie	14cb67862a	[OpenCL] allow generic address and non-generic defs for CL3.0 This allows both sets of definitions to exist on CL 3.0 Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D107318	2021-08-05 07:32:45 +10:00
Pushpinder Singh	f3eb5f900d	[AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107468	2021-08-04 15:24:46 +00:00
Pushpinder Singh	713a5d12cd	[OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904	2021-08-02 14:38:52 +00:00
Hans Wennborg	12dc13b73c	prfchwintrin.h: Make _m_prefetchw take a pointer to volatile (PR49124) For some reason, Microsoft declares _m_prefetch to take a const void, but _m_prefetchw to take a /volatile/ const void. Do the same for compatibility. Differential revision: https://reviews.llvm.org/D106790	2021-08-02 15:16:04 +02:00
Jon Chesterfield	7f97ddaf8a	Revert "[OpenMP][AMDGCN] Initial math headers support" Broke nvptx compilation on files including <complex> This reverts commit `12da97ea10`.	2021-07-30 22:07:00 +01:00
Nemanja Ivanovic	9019b55b60	[PowerPC] Fix byte ordering of ld/st with length on BE The builtins vec_xl_len_r and vec_xst_len_r actually use the wrong side of the vector on big endian Power9 systems. We never spotted this before because there was no such thing as a big endian distro that supported Power9. Now we have AIX and the elements are in the wrong part of the vector. This just fixes it so the elements are loaded to and stored from the right side of the vector.	2021-07-30 14:37:24 -05:00
Pushpinder Singh	12da97ea10	[OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904	2021-07-30 14:52:41 +00:00
Dave Airlie	3c7d2f1b67	[OpenCL] opencl-c.h: add CL 3.0 non-generic address space atomics CL 2.0 introduced atomics and generic address space so there were only one set of APIs for doing atomics, however since CL 3.0 makes generic address space optional, there has to be new sets of atomic interfaces to handle that cases. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D106778	2021-07-30 14:46:47 +10:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Anastasia Stulova	e5f47eedeb	[OpenCL] NULL redefined as nullptr in C++ mode. Redefines NULL as nullptr instead of ((void*)0) in C++ for OpenCL. Such internal representation of NULL provides compatibility with C++11 and later language standards. Patch by Topotuna (Justas Janickas)! Differential Revision: https://reviews.llvm.org/D105987	2021-07-27 16:33:50 +01:00
Nemanja Ivanovic	1c50a5da36	[PowerPC] Implement partial vector ld/st builtins for XL compatibility XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a sequence of 1 to 16 bytes in big endian order, right justified in the vector register (regardless of target endianness). This is equivalent to vec_xl_len_r/vec_xst_len_r which are only available on Power9. This patch simply uses the Power9 functions when compiled for Power9, but provides a more general implementation for Power8. Differential revision: https://reviews.llvm.org/D106757	2021-07-26 13:19:52 -05:00
Qiu Chaofan	240dde9482	[PowerPC] Change altivec indexed load/store builtins argument type This patch changes the index argument of lvxl?/lve[bhw]x and stvxl?/stve[bhw]x builtins from int to long. Because on 64-bit subtargets, an extra extsw will always been generated, which is incorrect. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D106530	2021-07-27 00:26:50 +08:00
Ulrich Weigand	8cd8120a7b	[SystemZ] Add support for new cpu architecture - arch14 This patch adds support for the next-generation arch14 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch14 as host processor. - Assembler/disassembler support for new instructions. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10304. Note: No currently available Z system supports the arch14 architecture. Once new systems become available, the official system name will be added as supported -march name.	2021-07-26 16:57:28 +02:00
Dave Airlie	9451403c5f	[OPENCL] opencl-c.h: add initial CL 3.0 conditionals for atomic operations. This adds the optional wrappers around things, however this isn't sufficient yet for CL 3.0 without generic address space, I've got one more additional patch to add all those APIs, but this is an easier to review precursor. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D106111	2021-07-26 11:06:33 +10:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Anastasia Stulova	5c63bf3abd	[OpenCL] Add NULL to standards prior to v2.0. NULL was undefined in OpenCL prior to version 2.0. However, the language specification states that "macro names defined by the C99 specification but not currently supported by OpenCL are reserved for future use". Therefore, application developers cannot redefine NULL. The change is supposed to resolve inconsistency between language versions. Currently there is no apparent reason why NULL should be kept undefined. Patch by Topotuna (Justas Janickas)! Differential Revision: https://reviews.llvm.org/D105988	2021-07-23 11:54:36 +01:00
Sven van Haastregt	989bedec7a	[OpenCL] Add cl_khr_integer_dot_product Add the builtins defined by Section 42 "Integer dot product" in the OpenCL Extension Specification. Differential Revision: https://reviews.llvm.org/D106434	2021-07-23 10:10:16 +01:00
namazso	91bc85b1eb	[MS] Preserve base register %esi around movs[bwl] fix for behavior reported in https://bugs.llvm.org/show_bug.cgi?id=51100 workaround for root cause https://bugs.llvm.org/show_bug.cgi?id=16830 similar to https://reviews.llvm.org/D101338 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106210	2021-07-23 16:28:32 +08:00
Aaron En Ye Shi	9ce931bd71	[HIP] Fix no matching constructor for init of shared_ptr and malloc Allow standard header versions of malloc and free to be defined before introducing the device versions. Fixes: SWDEV-295901 Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D106463	2021-07-22 14:32:41 +00:00
Thomas Lively	db7efcab7d	[WebAssembly] Remove clang builtins for extract_lane and replace_lane These builtins were added to capture the fact that the underlying Wasm instructions return i32s and implicitly sign or zero extend the extracted lanes in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations during code gen that these low-level details do not need to be exposed to users. This commit replaces the use of the builtins in wasm_simd128.h with normal target-independent vector code. As a result, we can switch the relevant intrinsics to use functions rather than macros and can use more user-friendly return types rather than trying to precisely expose the underlying Wasm types. Note, however, that the generated LLVM IR is no different after this change. Differential Revision: https://reviews.llvm.org/D106500	2021-07-21 16:11:00 -07:00
Yaxun (Sam) Liu	db5f100fe4	[HIP] Remove workaround in __clang_hip_runtime_wrapper.h Remove the workaround for -fopenmp in __clang_hip_runtime_wrapper.h since it causes device functions in HIP wrapper headers disabled when compiling HIP program with -fopenmp. Reviewed by: Aaron Enye Shi, Jon Chesterfield Differential Revision: https://reviews.llvm.org/D106070	2021-07-21 15:16:28 -04:00
Jon Chesterfield	d71062fbda	Revert "[OpenMP][AMDGCN] Initial math headers support" This reverts commit `968899ad9c`.	2021-07-21 17:35:40 +01:00
Pushpinder Singh	968899ad9c	[OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904	2021-07-21 16:15:39 +01:00
Sven van Haastregt	724f0e2abb	[OpenCL] Add cl_khr_extended_bit_ops Add the builtins defined by Section 40 "Extended Bit Operations" in the OpenCL Extension Specification. Differential Revision: https://reviews.llvm.org/D106267	2021-07-21 10:01:19 +01:00
Jon Chesterfield	3e649f8ef1	[openmp][nfc] Simplify macros guarding math complex headers The `__CUDA__` macro is already defined for openmp/nvptx and is not used by `__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies nvptx and avoids defining it on amdgcn (where it is likely to be harmful). Also dropped a cplusplus test from a C++ header as compilation will have failed on cmath earlier if it was included from C. Reviewed By: jdoerfert, fodinabor Differential Revision: https://reviews.llvm.org/D105221	2021-07-18 23:30:35 +01:00
Stefan Pintilie	0bf4b81d57	[Clang] Add an empty builtins.h file. On Power PC some legacy compilers included a number of builtins in a builtins.h header file. While this header file is not required to hold builtins for clang some legacy code does try to include this file and so this patch provides an empty version of that file. Differential Revision: https://reviews.llvm.org/D106065	2021-07-16 12:50:04 -05:00
Dave Airlie	de79ba9f9a	[OpenCL] opencl-c.h: CL3.0 generic address space This is one of the easier pieces of adding CL3.0 support. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D105526	2021-07-15 10:51:04 +10:00
Dave Airlie	090f007e34	[OpenCL][NFC] opencl-c.h: reorder atomic operations This just reorders the atomics, it doesn't change anything except their layout in the header. This is a prep patch for adding some conditionals around these for CL3.0 but that patch is much easier to review if all the atomic operations are grouped together like this. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D105601	2021-07-15 10:48:44 +10:00
Thomas Lively	4a4229f70f	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Thomas Lively	970e090010	[WebAssembly] Codegen for v128.loadX_lane instructions Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950	2021-07-14 11:31:53 -07:00
Thomas Lively	cbabfc63b1	[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755	2021-07-12 10:32:18 -07:00
Bardia Mahjour	2071ce9d45	[Altivec] Use signed comparison for vec_all_* and vec_any_* interfaces We are currently being inconsistent in using signed vs unsigned comparisons for vec_all_* and vec_any_* interfaces that use vector bool types. For example we use signed comparison for vec_all_ge(vector signed char, vector bool char) but unsigned comparison for when the arguments are swapped. GCC and XL use signed comparison instead. This patch makes clang consistent with itself and with XL and GCC. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105666	2021-07-12 11:41:16 -04:00
Nemanja Ivanovic	84e429693f	[PowerPC] Fix rounding mode for vec_round in altivec.h The function is supposed to be the equivalent of rint() (as in round to nearest, ties to even) rather than round() (round to nearest, ties away from zero). In fact, the instruction we emit without VSX is vrfin which is correct. However, with VSX we emit xvrspi which is the equivalent of round() and therefore incorrect. Since there is no equivalent VSX instruction, simply use vrfin regardless of availability of VSX.	2021-07-12 06:11:27 -05:00
Nemanja Ivanovic	41ce5ec5f6	[PowerPC] Remove unnecessary 64-bit guards from altivec.h A number of functions in the header have guards for 64-bit only that were presumably added as some of the functions in the blocks use vector __int128 which is only available in 64-bit mode. A more appropriate guard (__SIZEOF_INT128__) has been added for those functions since, making the 64-bit guards redundant. This patch removes those guards as they inadvertently guard code that uses vector long long which does not actually require 64-bit mode.	2021-07-12 04:59:00 -05:00

1 2 3 4 5 ...

1875 Commits