Commit Graph

1732 Commits

Author SHA1 Message Date
Michael Liao bb8d20d9f3 [cuda][hip] Fix typoes in header wrappers. 2020-12-21 13:02:47 -05:00
Simon Pilgrim 4855a1004d [X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.

Differential Revision: https://reviews.llvm.org/D92940
2020-12-13 15:37:35 +00:00
Anastasia Stulova a84599f177 [OpenCL] Implement extended subgroups fully in headers.
Extended subgroups are library style extensions and therefore
they require no changes in the frontend. This commit:
1. Moves extension macro definitions to the internal headers.
2. Removes extension pragmas because they are not needed.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D92231
2020-12-10 16:40:15 +00:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Masoud Ataei fc750f609d [PPC] Fixing a typo in altivec.h. Commenting out an unnecessary macro 2020-12-08 19:21:02 +00:00
Artem Belevich 4326792942 [CUDA] Another attempt to fix early inclusion of <new> from libstdc++
Previous patch (9a465057a6) did not fix the problem.
https://bugs.llvm.org/show_bug.cgi?id=48228

If the <new> is included too early, before CUDA-specific defines are available,
just include-next the standard <new> and undo the include guard.  CUDA-specific
variants of operator new/delete will be declared if/when <new> is used from the
CUDA source itself, when all CUDA-related macros are available.

Differential Revision: https://reviews.llvm.org/D91807
2020-12-04 12:03:35 -08:00
Martin Storsjö c17fdca188 [clang] [Headers] Use the corresponding _aligned_free or __mingw_aligned_free in _mm_free
Differential Revision: https://reviews.llvm.org/D92570
2020-12-04 11:34:12 +02:00
Aaron En Ye Shi ba2612ce01 [HIP] cmath demote long double args to double
Since there is no ROCm Device Library support for
long double, demote them to double, and use the fp64
math functions.

Differential Revision: https://reviews.llvm.org/D92130
2020-12-03 23:00:14 +00:00
Reid Kleckner 1e843a987d [MS] Add more 128bit cmpxchg intrinsics for AArch64
The MSVC STL for requires this on ARM64.
Requested in https://llvm.org/pr47099

Depends on D92061

Differential Revision: https://reviews.llvm.org/D92062
2020-11-25 12:07:28 -08:00
Artem Belevich 9a465057a6 [CUDA] Unbreak CUDA compilation with -std=c++20
Standard libc++ headers in stdc++ mode include <new> which picks up
cuda_wrappers/new before any of the CUDA macros have been defined.

We can not include CUDA headers that early, so the work-around is to define
__device__ in the wrapper header itself.

Differential Revision: https://reviews.llvm.org/D91807
2020-11-19 10:35:47 -08:00
Sven van Haastregt f0c690018a [OpenCL] Stop opencl-c-base.h leaking extension enabling
opencl-c.h disables all extensions at its end, but opencl-c-base.h
does not, and that causes any inclusion of only opencl-c-base.h to
leave some extensions (such as cl_khr_fp16) enabled.  This affects the
-fdeclare-opencl-builtins option for example.

This violates the OpenCL Extension Specification which specifies that
"The initial state of the compiler is as if the directive #pragma
OPENCL EXTENSION all : disable was issued".

Fix by disabling all extensions at the end of opencl-c-base.h and
enable extensions inside opencl.h which relied on opencl-c-base.h
enabling the cl_khr_fp16/64 extensions.

Differential Revision: https://reviews.llvm.org/D91429
2020-11-17 12:07:40 +00:00
Roland McGrath cf36142d34 [clang] Add missing header guard in <cpuid.h>
This header has long lacked a standard multiple inclusion guard
like other headers have, for no apparent reason.  The GCC header
of the same name likewise lacks one up through release 10.1, but
trunk GCC (release 11, and perhaps future 10.x) has fixed it
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96238).

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D91226
2020-11-10 19:34:25 -08:00
Qiu Chaofan 979a4d268a [PowerPC] [Clang] Port SSE4.1-compatible insert intrinsics
This patch adds three intrinsics compatible to x86's SSE 4.1 on PowerPC
target, with tests:

- _mm_insert_epi8
- _mm_insert_epi32
- _mm_insert_epi64

The intrinsics implementation is contributed by Paul Clarke.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D89242
2020-11-10 10:52:13 +08:00
Freddy Ye 5e312e0041 [X86] use macros to split GFNI intrinsics into different kinds
Tremont microarchitecture only has GFNI(SSE) version, not AVX and
AVX512 version. This patch is to avoid compiling fail on Windows when
using -march=tremont to invoke one of GFNI(SSE) intrinsic.

Differential Revision: https://reviews.llvm.org/D90822
2020-11-06 16:03:38 +08:00
Albion Fung 1af037f643 [PowerPC] Correct cpsgn's behaviour on PowerPC to match that of the ABI
This patch fixes the reversed behaviour exhibited by cpsgn on PPC. It now matches the ABI.

Differential Revision: https://reviews.llvm.org/D84962
2020-11-05 15:35:14 -05:00
Aaron En Ye Shi ca5b31502c [HIP] Math Headers to use type promotion
Similar to libcxx implementation of cmath function
overloads, use type promotion templates to determine
return types of multi-argument math functions.

Fixes: SWDEV-256825

Reviewed By: tra, yaxunl

Differential Revision: https://reviews.llvm.org/D90409
2020-11-03 18:40:26 +00:00
Liu, Chen3 756f597841 [X86] Support Intel avxvnni
This patch mainly made the following changes:

1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.

Differential Revision: https://reviews.llvm.org/D89105
2020-10-31 12:39:51 +08:00
Joachim Meyer eaee608448 [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in complex wrapper headers.
This is very similar to 7f1e6fcff9, just fixing a left-over.
With this, it should be possible to use both, -x cuda and -fopenmp in the same invocation,
enabling to use both OpenMP, targeting CPU, and CUDA, targeting the GPU.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90415
2020-10-29 23:24:49 +01:00
Johannes Doerfert 17c8251bca [OpenMP][CUDA][FIX] Use the new `remquo` overload only for OpenMP
CUDA buildbots complained about a redefinition when I landed D89971.
This is odd and I fail to understand where in the CUDA headers the other
definition is supposed to be. For now, given that CUDA doesn't need the
overload (AFAIKT), we simply restrict it to the OpenMP mode.
2020-10-27 23:52:59 -05:00
Johannes Doerfert b1a90e1599 [OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)`
Reported by Colleen Bertoni <bertoni@anl.gov> after running the OvO test
suite: https://github.com/TApplencourt/OvO/

The template overload is still hidden behind an ifdef for OpenMP. In the
future we probably want to remove the ifdef but that requires further
testing.

Reviewed By: JonChesterfield, tra

Differential Revision: https://reviews.llvm.org/D89971
2020-10-27 19:12:51 -05:00
Aaron En Ye Shi 3700556ecb [HIP][NFC] Use correct max in cuda_complex_builtins
Update the clang complex builtins for OpenMP to use the
correct max function from either __nv_* or __ocml_*.
2020-10-27 19:35:09 +00:00
Aaron En Ye Shi b2524eb944 [HIP] Fix HIP rounding math intrinsics
The __ocml_*_rte_f32 and __ocml_*_rte_f64 functions are not
available if OCML_BASIC_ROUNDED_OPERATIONS is not defined.

Reviewed By: b-sumner, yaxunl

Fixes: SWDEV-257235

Differential Revision: https://reviews.llvm.org/D89966
2020-10-22 15:57:09 +00:00
Tianqing Wang be39a6fe6f [X86] Add User Interrupts(UINTR) instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89301
2020-10-22 17:33:07 +08:00
Albion Fung d30155feaa [PowerPC] Implementation of 128-bit Binary Vector Rotate builtins
This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.

Differential Revision: https://reviews.llvm.org/D86819
2020-10-16 18:03:22 -04:00
Simon Pilgrim 6c23cbc560 [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.

The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.

Differential Revision: https://reviews.llvm.org/D87604
2020-10-13 09:28:39 +01:00
Wang, Pengfei 412cdcf2ed [X86] Add HRESET instruction.
For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89102
2020-10-13 08:47:26 +08:00
Aaron En Ye Shi 8d2a0c115e [HIP] NFC Add comments to cmath functions
Add missing comments to cmath functions.

Differential Revision: https://reviews.llvm.org/D88837
2020-10-06 15:26:56 +00:00
Aaron En Ye Shi aa2b593f14 [HIP] Restructure hip headers to add cmath
Separate __clang_hip_math.h header into __clang_hip_cmath.h
and __clang_hip_math.h. Improve the math function definition,
and add missing definitions or declarations. Add missing
overloads.

Reviewed By: tra, JonChesterfield

Differential Review: https://reviews.llvm.org/D88837
2020-10-06 14:48:53 +00:00
Craig Topper a02b449bb1 [X86] Sync AESENC/DEC Key Locker builtins with gcc.
For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.
2020-10-04 12:09:41 -07:00
Craig Topper 230c57b0bd [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
2020-10-04 12:09:35 -07:00
Craig Topper 28595cbbeb [X86] Synchronize the loadiwkey builtin operand order with gcc version. 2020-10-04 12:09:29 -07:00
Craig Topper 6c6cd5f8a9 [X86] Consolidate wide Key Locker intrinsics into the same header as the other Key Locker intrinsics. 2020-10-04 12:09:21 -07:00
Esme-Yi e3475f5b91 [PowerPC] Add builtins for xvtdiv(dp|sp) and xvtsqrt(dp|sp).
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.

Reviewed By: steven.zhang, amyk

Differential Revision: https://reviews.llvm.org/D88278
2020-10-04 16:24:20 +00:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Yaxun (Sam) Liu d04775e16b Add remquo, frexp and modf overload functions to HIP header 2020-09-29 20:57:56 -04:00
Artem Belevich 30514f0afa [CUDA] Added conversion functions to builtin vars.
This is needed to compile some headers in CUDA-11 that assume that threadIdx is
implicitly convertible to dim3. With NVCC, threadIdx is uint3 and there's
dim3(uint3) constructor. Clang uses a special type for the builtin variables, so
that path does not work. Instead, this patch adds conversion function to the
builtin variable classes. that will allow them to be converted to dim3 and uint3.

Differential Revision: https://reviews.llvm.org/D88250
2020-09-24 14:33:04 -07:00
Amy Kwan 6b136b19cb [Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins.
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.

These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.

Differential Revision: https://reviews.llvm.org/D83500
2020-09-23 22:55:25 -05:00
Amy Kwan 2e7117f847 [PowerPC] Implement the 128-bit vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins in Clang/LLVM
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.

Differential Revision: https://reviews.llvm.org/D87910
2020-09-23 16:49:40 -04:00
Albion Fung 88cdbeab41 [PowerPC] Implement Vector signed/unsigned __int128 overloads for the comparison builtins
This patch implements Vector signed/unsigned __int128 overloads for the comparison builtins.

Differential Revision: https://reviews.llvm.org/D87804
2020-09-23 16:49:40 -04:00
Albion Fung d7eb917a7c [PowerPC] Implementation of 128-bit Binary Vector Mod and Sign Extend builtins
This patch implements 128-bit Binary Vector Mod and Sign Extend builtins for PowerPC10.

Differential: https://reviews.llvm.org/D87394#inline-815858
2020-09-23 01:18:14 -05:00
Amy Kwan 079757b551 [PowerPC] Implement Vector String Isolate Builtins in Clang/LLVM
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.

Differential Revision: https://reviews.llvm.org/D87671
2020-09-22 11:31:44 -05:00
Amy Kwan b3147058de [PowerPC] Implement the 128-bit Vector Divide Extended Builtins in Clang/LLVM
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.

Differential Revision: https://reviews.llvm.org/D87729
2020-09-22 11:31:44 -05:00
Amy Kwan 37e7673c21 [PowerPC] Implement Move to VSR Mask builtins in LLVM/Clang
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82725
2020-09-18 18:16:14 -05:00
Amy Kwan 2c3bc918db [PowerPC] Implement Vector Count Mask Bits builtins in LLVM/Clang
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82726
2020-09-17 18:20:53 -05:00
Raul Tambre e09107ab80 [Sema] Introduce BuiltinAttr, per-declaration builtin-ness
Instead of relying on whether a certain identifier is a builtin, introduce BuiltinAttr to specify a declaration as having builtin semantics.

This fixes incompatible redeclarations of builtins, as reverting the identifier as being builtin due to one incompatible redeclaration would have broken rest of the builtin calls.
Mostly-compatible redeclarations of builtins also no longer have builtin semantics. They don't call the builtin nor inherit their attributes.
A long-standing FIXME regarding builtins inside a namespace enclosed in extern "C" not being recognized is also addressed.

Due to the more correct handling attributes for builtin functions are added in more places, resulting in more useful warnings.
Tests are updated to reflect that.

Intrinsics without an inline definition in intrin.h had `inline` and `static` removed as they had no effect and caused them to no longer be recognized as builtins otherwise.

A pthread_create() related test is XFAIL-ed, as it relied on it being recognized as a builtin based on its name.
The builtin declaration syntax is too restrictive and doesn't allow custom structs, function pointers, etc.
It seems to be the only case and fixing this would require reworking the current builtin syntax, so this seems acceptable.

Fixes PR45410.

Reviewed By: rsmith, yutsumi

Differential Revision: https://reviews.llvm.org/D77491
2020-09-17 19:28:57 +03:00
Johannes Doerfert 56069b5c71 [OpenMP] Support `std::complex` math functions in target regions
The last (big) missing piece to get "math" working in OpenMP target
regions (that I know of) was complex math functions, e.g.,
`std::sin(std::complex<double>)`. With this patch we overload the system
template functions for these operations with versions that have been
distilled from `libcxx/include/complex`. We use the same
  `omp begin/end declare variant`
mechanism we use for other math functions before, except that we this
time overload templates (via D85735).

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D85777
2020-09-16 13:37:10 -05:00
Johannes Doerfert 5c1084e884 [OpenMP] Context selector extensions for template functions
With this extension the effects of `omp begin declare variant` will be
applied to template function declarations. The behavior is opt-in and
controlled by the `extension(allow_templates)` trait. While generally
useful, this will enable us to implement complex math function calls by
overloading the templates of the standard library with the ones in
libc++.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D85735
2020-09-16 13:37:10 -05:00
Johannes Doerfert 97652202d1 [OpenMP] Overload `std::isnan` and friends multiple times for the GPU
`std::isnan` and friends can be found in two variants in the wild, one
returns `bool`, as the standard defines it, one returns `int`, as the C
macros do. So far we kinda hoped the system versions of these functions
will work for people, e.g. they are definitions that can be compiled for
the target. We know that is not the case always so we leverage the
`disable_implicit_base` OpenMP context extension to specialize both
versions of these functions without causing an invalid redeclaration.

Reviewed By: JonChesterfield, tra

Differential Revision: https://reviews.llvm.org/D85879
2020-09-16 13:37:09 -05:00
Albion Fung 05aa997d51 [PowerPC] Implement __int128 vector divide operations
This patch implements __int128 vector divide operations for ISA3.1.

Differential Revision: https://reviews.llvm.org/D85453
2020-09-15 15:19:35 -04:00
Amy Kwan efa57f9a7a [PowerPC] Implement Vector Expand Mask builtins in LLVM/Clang
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82727
2020-09-06 17:13:21 -05:00