Commit Graph

1875 Commits

Author SHA1 Message Date
Thomas Lively fd3bd63df2 [WebAssembly] Make bitmask instructions return unsigned ints
Since they are bitmasks, it will be more common for them to be used and
potentially extended to 64-bit integers as unsigned values rather than signed
values.

Differential Revision: https://reviews.llvm.org/D108401
2021-08-19 16:23:47 -07:00
Martin Storsjö cc3affd8b0 [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64
The code is based on the same __mulh and __umulh intrinsics for
x86.

This should fix PR51128.

Differential Revision: https://reviews.llvm.org/D106721
2021-08-19 11:29:55 +03:00
Jon Chesterfield dbd7bad9ad [openmp] Annotate tmp variables with omp_thread_mem_alloc
Fixes miscompile of calls into ocml. Bug 51445.

The stack variable `double __tmp` is moved to dynamically allocated shared
memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable
is passed to a function that is explicitly annotated address_space(5) then
allocating the variable off-stack leads to a miscompile in the back end,
which cannot decide to move the variable back to the stack from shared.

This could be fixed by removing the AS(5) annotation from the math library
or by explicitly marking the variables as thread_mem_alloc. The cast to
AS(5) is still a no-op once IR is reached.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D107971
2021-08-19 02:22:11 +01:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Craig Topper 705b1191aa [X86] Add parentheses around casts in X86 intrinsic headers.
Fixes PR51324.
2021-08-14 18:14:44 -07:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Craig Topper d2cb189184 [X86] Use a do {} while (0) in the _MM_EXTRACT_FLOAT implementation.
Previously we just used {}, but that doesn't work in situations
like this.

if (1)
  _MM_EXTRACT_FLOAT(d, x, n);
else
  ...

The semicolon would terminate the if.
2021-08-14 16:41:55 -07:00
Craig Topper 73c4c32767 [X86] Use __builtin_bit_cast _mm_extract_ps instead of type punning through a union. NFC 2021-08-14 16:35:55 -07:00
Craig Topper 4190d99dfc [X86] Add parentheses around casts in some of the X86 intrinsic headers.
This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros
due to rounding mode.

Fixes part of PR51324.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D107843
2021-08-13 09:36:16 -07:00
Jon Chesterfield 6a8e5120ab Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc"
This reverts commit b6113548c9.
2021-08-12 17:44:36 +01:00
Jon Chesterfield b6113548c9 [openmp] Annotate tmp variables with omp_thread_mem_alloc
Fixes miscompile of calls into ocml. Bug 51445.

The stack variable `double __tmp` is moved to dynamically allocated shared
memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable
is passed to a function that is explicitly annotated address_space(5) then
allocating the variable off-stack leads to a miscompile in the back end,
which cannot decide to move the variable back to the stack from shared.

This could be fixed by removing the AS(5) annotation from the math library
or by explicitly marking the variables as thread_mem_alloc. The cast to
AS(5) is still a no-op once IR is reached.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D107971
2021-08-12 17:30:22 +01:00
Freddy Ye 6c1468854d [X86] Reverse *_set_ph and *_setr_ph 's set order.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D107946
2021-08-12 16:27:04 +08:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Dave Airlie 1854db74c5 opencl-c.h: add 3.0 optional extension support for a few more bits
These 3 are fairly simple, pipes, workgroups and subgroups.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D105858
2021-08-07 09:25:00 +10:00
Justas Janickas a5a2f05dcc [C++4OpenCL] Introduces __remove_address_space utility
This change provides a way to conveniently declare types that have
address space qualifiers removed.

Since OpenCL adds address spaces implicitly even when they are not
specified in source, it is useful to allow deriving address space
unqualified types.

Fixes llvm.org/PR45326

Differential Revision: https://reviews.llvm.org/D106785
2021-08-06 10:40:22 +01:00
Jon Chesterfield 509854b69c [clang] Replace asm with __asm__ in cuda header
Asm is a gnu extension for C, so at present -fopenmp -std=c99
and similar fail to compile on nvptx, bug 51344

Changing to `__asm__` or `__asm` works for openmp, all three appear to work
for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different
syntax, so this should make for better error diagnostics if the header is
passed to a compiler other than clang.

Reviewed By: tra, emankov

Differential Revision: https://reviews.llvm.org/D107492
2021-08-05 18:46:57 +01:00
Dave Airlie 14cb67862a [OpenCL] allow generic address and non-generic defs for CL3.0
This allows both sets of definitions to exist on CL 3.0

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D107318
2021-08-05 07:32:45 +10:00
Pushpinder Singh f3eb5f900d [AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef
This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107468
2021-08-04 15:24:46 +00:00
Pushpinder Singh 713a5d12cd [OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Reviewed By: JonChesterfield, jdoerfert, scchan

Differential Revision: https://reviews.llvm.org/D104904
2021-08-02 14:38:52 +00:00
Hans Wennborg 12dc13b73c prfchwintrin.h: Make _m_prefetchw take a pointer to volatile (PR49124)
For some reason, Microsoft declares _m_prefetch to take a const void*,
but _m_prefetchw to take a /volatile/ const void*.

Do the same for compatibility.

Differential revision: https://reviews.llvm.org/D106790
2021-08-02 15:16:04 +02:00
Jon Chesterfield 7f97ddaf8a Revert "[OpenMP][AMDGCN] Initial math headers support"
Broke nvptx compilation on files including <complex>

This reverts commit 12da97ea10.
2021-07-30 22:07:00 +01:00
Nemanja Ivanovic 9019b55b60 [PowerPC] Fix byte ordering of ld/st with length on BE
The builtins vec_xl_len_r and vec_xst_len_r actually use the
wrong side of the vector on big endian Power9 systems. We never
spotted this before because there was no such thing as a big
endian distro that supported Power9. Now we have AIX and the
elements are in the wrong part of the vector. This just fixes
it so the elements are loaded to and stored from the right
side of the vector.
2021-07-30 14:37:24 -05:00
Pushpinder Singh 12da97ea10 [OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Reviewed By: JonChesterfield, jdoerfert, scchan

Differential Revision: https://reviews.llvm.org/D104904
2021-07-30 14:52:41 +00:00
Dave Airlie 3c7d2f1b67 [OpenCL] opencl-c.h: add CL 3.0 non-generic address space atomics
CL 2.0 introduced atomics and generic address space so there were
only one set of APIs for doing atomics, however since CL 3.0
makes generic address space optional, there has to be new sets
of atomic interfaces to handle that cases.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D106778
2021-07-30 14:46:47 +10:00
Thomas Lively 33786576fd [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Anastasia Stulova e5f47eedeb [OpenCL] NULL redefined as nullptr in C++ mode.
Redefines NULL as nullptr instead of ((void*)0)
in C++ for OpenCL.

Such internal representation of NULL provides
compatibility with C++11 and later language
standards.

Patch by Topotuna (Justas Janickas)!

Differential Revision: https://reviews.llvm.org/D105987
2021-07-27 16:33:50 +01:00
Nemanja Ivanovic 1c50a5da36 [PowerPC] Implement partial vector ld/st builtins for XL compatibility
XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a
sequence of 1 to 16 bytes in big endian order, right justified in the
vector register (regardless of target endianness).
This is equivalent to vec_xl_len_r/vec_xst_len_r which are only
available on Power9.

This patch simply uses the Power9 functions when compiled for Power9,
but provides a more general implementation for Power8.

Differential revision: https://reviews.llvm.org/D106757
2021-07-26 13:19:52 -05:00
Qiu Chaofan 240dde9482 [PowerPC] Change altivec indexed load/store builtins argument type
This patch changes the index argument of lvxl?/lve[bhw]x and
stvxl?/stve[bhw]x builtins from int to long. Because on 64-bit
subtargets, an extra extsw will always been generated, which is
incorrect.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D106530
2021-07-27 00:26:50 +08:00
Ulrich Weigand 8cd8120a7b [SystemZ] Add support for new cpu architecture - arch14
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10304.

Note: No currently available Z system supports the arch14
architecture.  Once new systems become available, the
official system name will be added as supported -march name.
2021-07-26 16:57:28 +02:00
Dave Airlie 9451403c5f [OPENCL] opencl-c.h: add initial CL 3.0 conditionals for atomic operations.
This adds the optional wrappers around things, however this isn't sufficient yet for CL 3.0 without generic address space, I've got one more additional patch to add all those APIs, but this is an easier to review precursor.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D106111
2021-07-26 11:06:33 +10:00
Thomas Lively 85157c0079 [WebAssembly] Codegen for pmin and pmax
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.

Differential Revision: https://reviews.llvm.org/D106612
2021-07-23 14:49:21 -07:00
Anastasia Stulova 5c63bf3abd [OpenCL] Add NULL to standards prior to v2.0.
NULL was undefined in OpenCL prior to version 2.0. However, the
language specification states that "macro names defined by the C99
specification but not currently supported by OpenCL are reserved
for future use". Therefore, application developers cannot redefine
NULL.

The change is supposed to resolve inconsistency between language
versions. Currently there is no apparent reason why NULL should
be kept undefined.

Patch by Topotuna (Justas Janickas)!

Differential Revision: https://reviews.llvm.org/D105988
2021-07-23 11:54:36 +01:00
Sven van Haastregt 989bedec7a [OpenCL] Add cl_khr_integer_dot_product
Add the builtins defined by Section 42 "Integer dot product" in
the OpenCL Extension Specification.

Differential Revision: https://reviews.llvm.org/D106434
2021-07-23 10:10:16 +01:00
namazso 91bc85b1eb [MS] Preserve base register %esi around movs[bwl]
fix for behavior reported in https://bugs.llvm.org/show_bug.cgi?id=51100 workaround for root cause https://bugs.llvm.org/show_bug.cgi?id=16830

similar to https://reviews.llvm.org/D101338

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106210
2021-07-23 16:28:32 +08:00
Aaron En Ye Shi 9ce931bd71 [HIP] Fix no matching constructor for init of shared_ptr and malloc
Allow standard header versions of malloc and free to be defined
before introducing the device versions.

Fixes: SWDEV-295901

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D106463
2021-07-22 14:32:41 +00:00
Thomas Lively db7efcab7d [WebAssembly] Remove clang builtins for extract_lane and replace_lane
These builtins were added to capture the fact that the underlying Wasm
instructions return i32s and implicitly sign or zero extend the extracted lanes
in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations
during code gen that these low-level details do not need to be exposed to users.

This commit replaces the use of the builtins in wasm_simd128.h with normal
target-independent vector code. As a result, we can switch the relevant
intrinsics to use functions rather than macros and can use more user-friendly
return types rather than trying to precisely expose the underlying Wasm types.
Note, however, that the generated LLVM IR is no different after this change.

Differential Revision: https://reviews.llvm.org/D106500
2021-07-21 16:11:00 -07:00
Yaxun (Sam) Liu db5f100fe4 [HIP] Remove workaround in __clang_hip_runtime_wrapper.h
Remove the workaround for -fopenmp in __clang_hip_runtime_wrapper.h
since it causes device functions in HIP wrapper headers disabled when
compiling HIP program with -fopenmp.

Reviewed by: Aaron Enye Shi, Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D106070
2021-07-21 15:16:28 -04:00
Jon Chesterfield d71062fbda Revert "[OpenMP][AMDGCN] Initial math headers support"
This reverts commit 968899ad9c.
2021-07-21 17:35:40 +01:00
Pushpinder Singh 968899ad9c [OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Reviewed By: JonChesterfield, jdoerfert, scchan

Differential Revision: https://reviews.llvm.org/D104904
2021-07-21 16:15:39 +01:00
Sven van Haastregt 724f0e2abb [OpenCL] Add cl_khr_extended_bit_ops
Add the builtins defined by Section 40 "Extended Bit Operations" in
the OpenCL Extension Specification.

Differential Revision: https://reviews.llvm.org/D106267
2021-07-21 10:01:19 +01:00
Jon Chesterfield 3e649f8ef1 [openmp][nfc] Simplify macros guarding math complex headers
The `__CUDA__` macro is already defined for openmp/nvptx and is not used by
`__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).

Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.

Reviewed By: jdoerfert, fodinabor

Differential Revision: https://reviews.llvm.org/D105221
2021-07-18 23:30:35 +01:00
Stefan Pintilie 0bf4b81d57 [Clang] Add an empty builtins.h file.
On Power PC some legacy compilers included a number of builtins in a
builtins.h header file. While this header file is not required to hold
builtins for clang some legacy code does try to include this file and so
this patch provides an empty version of that file.

Differential Revision: https://reviews.llvm.org/D106065
2021-07-16 12:50:04 -05:00
Dave Airlie de79ba9f9a [OpenCL] opencl-c.h: CL3.0 generic address space
This is one of the easier pieces of adding CL3.0 support.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D105526
2021-07-15 10:51:04 +10:00
Dave Airlie 090f007e34 [OpenCL][NFC] opencl-c.h: reorder atomic operations
This just reorders the atomics, it doesn't change anything except their layout in the header.

This is a prep patch for adding some conditionals around these for CL3.0 but that patch is much easier to review if all the atomic operations are grouped together like this.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D105601
2021-07-15 10:48:44 +10:00
Thomas Lively 4a4229f70f [WebAssembly] Codegen for v128.storeX_lane instructions
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.

Differential Revision: https://reviews.llvm.org/D106019
2021-07-14 16:15:25 -07:00
Thomas Lively 970e090010 [WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950
2021-07-14 11:31:53 -07:00
Thomas Lively cbabfc63b1 [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Bardia Mahjour 2071ce9d45 [Altivec] Use signed comparison for vec_all_* and vec_any_* interfaces
We are currently being inconsistent in using signed vs unsigned comparisons for
vec_all_* and vec_any_* interfaces that use vector bool types. For example we
use signed comparison for vec_all_ge(vector signed char, vector bool char) but
unsigned comparison for when the arguments are swapped. GCC and XL use signed
comparison instead. This patch makes clang consistent with itself and with XL
and GCC.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D105666
2021-07-12 11:41:16 -04:00
Nemanja Ivanovic 84e429693f [PowerPC] Fix rounding mode for vec_round in altivec.h
The function is supposed to be the equivalent of rint() (as in
round to nearest, ties to even) rather than round() (round to
nearest, ties away from zero). In fact, the instruction we emit
without VSX is vrfin which is correct. However, with VSX we emit
xvrspi which is the equivalent of round() and therefore incorrect.
Since there is no equivalent VSX instruction, simply use vrfin
regardless of availability of VSX.
2021-07-12 06:11:27 -05:00
Nemanja Ivanovic 41ce5ec5f6 [PowerPC] Remove unnecessary 64-bit guards from altivec.h
A number of functions in the header have guards for 64-bit only
that were presumably added as some of the functions in the blocks
use vector __int128 which is only available in 64-bit mode.
A more appropriate guard (__SIZEOF_INT128__) has been added for
those functions since, making the 64-bit guards redundant.
This patch removes those guards as they inadvertently guard code
that uses vector long long which does not actually require 64-bit
mode.
2021-07-12 04:59:00 -05:00