The goal of this patch is to improve distribution build's flexibility to include only applicable header files.
Currently, the clang-resource-headers target contains nearly all the files in clang/lib/Headers. Most of these files are platform specific (e.g. immintrin.h is x86 specific). A distribution build will have to either include all the headers for all the platforms, or not include any headers. For example, if a distribution build for powerpc includes the clang-resource-headers target, it will include all the x86 specific headers, even-though the x86 specific headers cannot be used.
This patch breaks up the clang-resource-headers list to a core list and platform specific lists. With the patch, a distribution build can now include the ppc-resource-headers to include the headers applicable to the powerpc platform.
Specifically, one can now have
cmake ... LLVM_DISTRIBUTION_COMPONENTS="clang;ppc-resource-headers" ... ../llvm
ninja install-distribution then installs the powerpc headers.
Similarly, one can do
cmake ... LLVM_DISTRIBUTION_COMPONENTS="clang;x86-resource-headers" ... ../llvm
to include headers applicable to the x86 platform in a distribution installation.
To implement this behaviour, the patch does two things:
* It breaks up the long files header file list to a core list and platform specific lists.
* It adds numerous platform specific installation targets.
Differential Revision: https://reviews.llvm.org/D123498
Align with the `-fdeclare-opencl-builtins` option and other
get_image_* builtins which have the const attribute.
Differential Revision: https://reviews.llvm.org/D122728
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the argument name identifiers.
Continues the direction set out in D119560.
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" any single-letter identifiers.
Continues the direction set out in D119560.
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the identifiers "x", "y" and
"z".
Continues the direction set out in D119560.
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the identifiers "data" and
"offset".
Continues the direction set out in D119560.
https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.
[0]: https://crbug.com/1279344
Differential revision: https://reviews.llvm.org/D121653
Extension to 4390c721cb - similar to the vanilla load/store intrinsics, _mm_lddqu_si128/_mm256_lddqu_si256 should take an unaligned pointer, but were using the aligned m128i/m256i types which can cause alignment warnings.
The existing sse3-builtins.c and avx-builtins.c tests in llvm-project\clang\test\CodeGen\X86 should cover this.
Differential Revision: https://reviews.llvm.org/D121815
Fix the instruction names to match the WebAssembly spec:
- `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
- `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`
Also rename related things like intrinsics, builtins, and test functions to
match.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D121661
Add the rest of intrinsics to clang except intrinsics using vector mask
registers.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D121586
NVVM IR specification defines them with i32 return type:
declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
...
The i32 return value is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
as well as PTX ISA:
9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
match.any.sync.type d, a, membermask;
match.all.sync.type d[|p], a, membermask;
...
Destination d is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
Additionally, ptxas doesn't accept intructions, produced by NVPTX backend.
After this patch, it compiles with no issues.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120499
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the identifiers "image",
"image_array", "coord", "sampler", "sample", "gradientX", "gradientY",
"lod", and "color".
Continues the direction set out in D119560.
Until now, subgroup builtins are available with `opencl-c.h` when at
least one of `cl_intel_subgroups`, `cl_khr_subgroups`, or
`__opencl_c_subgroups` is defined. With `-fdeclare-opencl-builtins`,
subgroup builtins are conditionalized on `cl_khr_subgroups` only.
Align `-fdeclare-opencl-builtins` to `opencl-c.h` by introducing the
internal `__opencl_subgroup_builtins` macro.
Differential Revision: https://reviews.llvm.org/D120254
Most places already seem to use the short spelling instead of
'unsigned int/long', so perform the following substitutions:
s/unsigned int /uint /g
s/unsigned long /ulong /g
This simplifies completeness comparisons against OpenCLBuiltins.td.
Differential Revision: https://reviews.llvm.org/D120032
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the identifiers "success",
"failure", "desired", "value".
Differential Revision: https://reviews.llvm.org/D119560
Commit 3c7d2f1b67 ("[OpenCL] opencl-c.h: add CL 3.0 non-generic
address space atomics", 2021-07-30) added some atomic_fetch_add/sub
overloads with uintptr_t arguments twice. Instead, they should have
been atomic_fetch_max overloads with non-generic address spaces.
Post-commit review feedback suggested dropping the deprecated
diagnostic for the 'noreturn' macro (the diagnostic from the header
file suffices and the macro diagnostic could be confusing) and to only
issue the deprecated diagnostic for [[_Noreturn]] when the attribute
identifier is either directly written or not from a system macro.
Amends the commit made in 5029dce492.
This adds support for http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf,
which was adopted at the Feb 2022 WG14 meeting. That paper adds
[[noreturn]] and [[_Noreturn]] to the list of supported attributes in
C2x. These attributes have the same semantics as the [[noreturn]]
attribute in C++.
The [[_Noreturn]] attribute was added as a deprecated feature so that
translation units which include <stdnoreturn.h> do not get an error on
use of [[noreturn]] because the macro expands to _Noreturn. Users can
use -Wno-deprecated-attributes to silence the diagnostic.
Use of <stdnotreturn.h> or the noreturn macro were both deprecated.
Users can define the _CLANG_DISABLE_CRT_DEPRECATION_WARNINGS macro to
suppress the deprecation diagnostics coming from the header file.
Add the atomic overloads for the `global` and `local` address spaces,
which are new in OpenCL 3.0. Ensure the preexisting `generic`
overloads are guarded by the generic address space feature macro.
Ensure a subset of the atomic builtins are guarded by the
`__opencl_c_atomic_order_seq_cst` and `__opencl_c_atomic_scope_device`
feature macros, and enable those macros for SPIR/SPIR-V targets in
`opencl-c-base.h`.
Also guard the `cl_ext_float_atomics` builtins with the atomic order
and scope feature macros.
Differential Revision: https://reviews.llvm.org/D119420
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions
This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:
__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_adds_epi8
// CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
return _mm256_adds_epi8(a, b);
}
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions
This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:
__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_adds_epi8
// CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
return _mm256_adds_epi8(a, b);
}
This patch drops throws specifier in posix_memalign declaration because
that's different between glibc and other libc, and Clang has a hack.
Differential Revision: https://reviews.llvm.org/D117972
Part of the _BitInt feature in C2x
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf) is a new
macro in limits.h named BITINT_MAXWIDTH that can be used to determine
the maximum width of a bit-precise integer type. This macro must expand
to a value that is at least as large as ULLONG_WIDTH.
This adds an implementation-defined macro named __BITINT_MAXWIDTH__ to
specify that value, which is used by limits.h for the standard macro.
This also limits the maximum bit width to 128 bits because backends do
not currently support all mathematical operations (such as division) on
wider types yet. This maximum is expected to be increased in the future.
AIX provides additional definitions in the system libc float.h that we
would like to be available to users, so we need to include_next through,
similar to what is done on some other platforms.
We also adjust the guards for some definitions which are restricted
based on language level to also be provide with the _ALL_SOURCE feature
test macro on AIX, similar to what is done by the platform float.h
header, so we don't run into cases where we don't provide the compiler
macro but still have a different definition from the system.
Differential Revision: https://reviews.llvm.org/D117935
The named address space overloads of builtins that take a pointer
argument are conditionalized on the `__opencl_c_generic_address_space`
feature macro (in a `#else` body). Introduce an internal feature
macro instead, such that their availability can be controlled in a
single place and independently of the generic address space feature
macro.
This commit does not change the available builtins.
Differential Revision: https://reviews.llvm.org/D118158
This feature requires support of __opencl_c_generic_address_space and
__opencl_c_program_scope_global_variables so diagnostics for that is provided as well.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D115640
Ensure any use of a `read_write` image is guarded behind the
`__opencl_c_read_write_images` feature macro.
Differential Revision: https://reviews.llvm.org/D117899
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:
llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c
Differential Revision: https://reviews.llvm.org/D117881
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions
This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_max_epu32
// CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Sibling patch to D117791
Differential Revision: https://reviews.llvm.org/D117798
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)
This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
// CHECK-LABEL: test_mm256_abs_epi8
// CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Differential Revision: https://reviews.llvm.org/D117791
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions
This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_max_epu32
// CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Sibling patch to D117791
Differential Revision: https://reviews.llvm.org/D117798
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)
This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
// CHECK-LABEL: test_mm256_abs_epi8
// CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Differential Revision: https://reviews.llvm.org/D117791
C17 deprecated ATOMIC_VAR_INIT with the resolution of DR 485. C++
followed suit when adopting P0883R2 for C++20, but additionally chose
to deprecate ATOMIC_FLAG_INIT at the same time despite the macro still
being required in C. This patch marks both macros as deprecated when
appropriate to do so.
This completes the implementation of
WG14 N2412 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf),
which standardizes C on a twos complement representation for integer
types. The only work that remained there was to define the correct
macros in the standard headers, which this patch does.
ROCm 4.5 device library introduced __ockl_dm_alloc and __ockl_dm_dealloc
for supporting device side malloc/free.
This patch redefines device malloc/free to use these functions.
It also fixes a bug in the wrapper header which incorrectly defines free
with return type void* instead of void.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D116967
HVX does not have load/store instructions for vector predicates (i.e. bool
vectors). Because of that, vector predicates need to be converted to another
type before being stored, and the most convenient representation is an HVX
vector.
As a consequence, in C/C++, source-level builtins that either take or
produce vector predicates take or return regular vectors instead. On the
other hand, the corresponding LLVM intrinsics do have boolean types that,
and so a conversion of the operand or the return value was necessary.
This conversion would happen inside clang's codegen, but was somewhat
fragile.
This patch changes the strategy: a builtin that takes a vector predicate
now really expects a vector predicate. Since such a predicate cannot be
provided via a variable, this builtin must be composed with other builtins
that either convert vector to a predicate (V6_vandvrt) or predicate to a
vector (V6_vandqrt).
For users using builtins defined in hvx_hexagon_protos.h there is no impact:
the conversions were added to that file. Other users will need to insert
- __builtin_HEXAGON_V6_vandvrt[_128B](V, -1) to convert vector V to a
vector predicate, or
- __builtin_HEXAGON_V6_vandqrt[_128B](Q, -1) to convert vector predicate Q
to a vector.
Builtins __builtin_HEXAGON_V6_vmaskedstore.* are a temporary exception to
that, but they are deprecated and should not be used anyway. In the future
they will either follow the same rule, or be removed.
Use the "pure" attribute (or "readonly") for the vload, vload_half and
vloada_half builtins.
Includes test changes to SemaOpenCL/fdeclare-opencl-builtins.cl to avoid
triggering unused-result warnings.
Reviewed By: svenvh
Differential Revision: https://reviews.llvm.org/D110742
Use the "pure" attribute (or "readonly") for the vload, vload_half and
vloada_half builtins.
Reviewed By: svenvh
Differential Revision: https://reviews.llvm.org/D110742
This patch implements the following:
- Emit PACBTI-M build attributes in libunwind asm files
- Authenticate LR in DWARF32 using PACBTI
Use Armv8.1-M.Main PACBTI extension to authenticate the return address
(stored in the LR register) before moving it to the PC (IP) register.
The AUTG instruction is used with the candidate return address, the CFA,
and the authentication code that is retrieved from the saved
pseudo-register RA_AUTH_CODE.
- Authenticate LR in EHABI using PACBTI
Authenticate the contents of the LR register using Armv8.1-M.Main PACBTI
extension.
A new frame unwinding instruction is introduced (0xb4). This
instruction pops out of the stack the return address authentication
code, which is then used in conjunction with the SP and the next-to-be
instruction pointer to perform authentication.
This authentication code is popped into a new register,
UNW_ARM_PSEUDO_PAC, which is a pseudo-register.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Victor Campos
- Ties Stuij
Reviewed By: #libunwind, danielkiss, mstorsjo
Differential Revision: https://reviews.llvm.org/D112430
The 14.31.30818 toolset has the following in the `stdatomic.h`:
~~~
#ifndef __cplusplus
#error <stdatomic.h> is not yet supported when compiling as C, but this is planned for a future release.
#endif
~~~
This results in clang failing to build existing code which relied on
`stdatomic.h` in C mode on Windows. Simply fallback to the clang header
until that header is available as a complete implementation.
The clang portion of c933c2eb33 was missed as I made
some kind of mistake squashing the commits with git.
This patch just adds those.
The original review: https://reviews.llvm.org/D114088
The XL implementation of vec_round for vector double uses
"round-to-nearest, ties to even" just as the vector float
`version does. However clang and gcc use "round-to-nearest-away"
for vector double and "round-to-nearest, ties to even"
for vector float.
The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__
macro similarly to other instances of incompatibility.
Differential revision: https://reviews.llvm.org/D113642
This enables Intel intrinsics support on FreeBSD.
Thanks to @pkubaj who noticed this feature was missing
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D113451
With this,
void f() { __asm__("mov eax, ebx"); }
now compiles with clang with -masm=intel.
This matches gcc.
The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.
One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):
__asm__("mov eax, ebx");
__asm__(".att_syntax\nmovl %ebx, %eax");
__asm__("mov eax, ebx");
This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:
bt{l %[Offset],%[Base] | %[Base],%[Offset]}
This works in LLVM after D113932 and D113894, so use that.
(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)
It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.
Fixes PR21401 and PR20241.
Differential Revision: https://reviews.llvm.org/D113707
This splits out the generated headers and conditonalises them upon the
target being enabled.
The motivation here is that the RISCV header alone added 10MB to the
resource directory, which was previously at 10MB, increasing the build
size and time. This header is contributing ~50% of the size of the
resource headers (~10MB).
The ARM generated headers are contributing about ~10% or 1MB.
This could be extended further adding only the static resource headers
for the targets that the LLVM build supports.
The changes to the tests for ARM mirror what the RISCV target already
did and rnk identified as a possible issue.
Testing:
cmake -G Ninja -D LLVM_TARGETS_TO_BUILD=X86 -D LLVM_ENABLE_PROJECTS="clang;lld" ../clang
ninja check-clang
Differential Revision: https://reviews.llvm.org/D112890
Reviewed By: craig.topper
Clang builtin utility `__remove_address_space` now works if generic
address space is not supported in C++ for OpenCL 2021.
Differential Revision: https://reviews.llvm.org/D110155
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.
The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.
Added and updated tests for parts that are different between
SPIR and SPIR-V.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D109144
- Add the missing NVVM predicate builtins on address space checking
- Redefine them as pure functions so that they could be used in
__builtin_assume.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D112053
CUDA-11 headers rely on these NVCC builtins.
Despite having `__nv` previx, those are *not* provided by libdevice.
Differential Revision: https://reviews.llvm.org/D111665
Add atomic_half types and builtins operating on the types from the
cl_ext_float_atomics extension.
Patch by Haonan Yang.
Differential Revision: https://reviews.llvm.org/D109740
The SSE4 header (smmintrin.h) should include SSSE3 (tmmintrin.h) instead
of SSE2 (emmintrin.h).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111482
This patch updates the vec_extract builtins to take a signed int as the second
parameter, as defined by the Power Vector Intrinsics Programming Reference.
This patch is NFC and all existing tests pass.
Differential Revision: https://reviews.llvm.org/D110935
This patch updates the vec_popcnt builtins to return vector unsigned,
as defined by the Power Vector Intrinsics Programming Reference.
This patch is NFC and all existing tests pass.
Differential Revision: https://reviews.llvm.org/D110934
The patch implements header-only support for testure lookups.
The patch has been tested on a source file with all possible combinations of
argument types supported by CUDA headers, compiled and verified that the
generated instructions and their parameters match the code generated by NVCC.
Unfortunately, compiling texture code requires CUDA headers and can't be tested
in clang itself. The test will need to be added to the test-suite later.
While generated code compiles and seems to match NVCC, I do not have any code
that uses textures that I could test correctness of the implementation. Hence
the experimental status.
Differential Revision: https://reviews.llvm.org/D110089
It's true that docs.microsoft.com says:
"""The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler
intrinsics and the MemoryBarrier macro are all deprecated and should not
be used. For inter-thread communication, use mechanisms such as
atomic_thread_fence and std::atomic<T>, which are defined in the C++
Standard Library. For hardware access, use the /volatile:iso compiler
option together with the volatile keyword."""
And these attributes have been here since these builtins were added in
r192860.
However:
- cl.exe does not warn on them even with /Wall
- none of the replacements are useful for C code
- we don't add __attribute__((__deprecated__())) to any other
declarations in intrin.h
- intrin0.h in the MSVC headers declares _ReadWriteBarrier() (but
without the deprecation attribute), so you get inconsistent
deprecation warnings depending on if you include intrin.h or intrin0.h
The motivation is that compiling sqlite.h with clang-cl produces a
deprecation warning with clang-cl for _ReadWriteBarrier(), but not with
cl.exe.
Differential Revision: https://reviews.llvm.org/D111232
The builtin for vec_orc has support for the following two signatures,
but currently the compiler marks it ambiguous:
vector float vec_orc(vector float, vector float)
vector double vec_orc(vector double, vector double)
This patch implements these two builtins.
Differential revision: https://reviews.llvm.org/D110858
These builtins produce inefficient code for CPU's prior to Power8
due to vcmpequd being unavailable. The predicate forms can actually
leverage the available vcmpequw along with xxlxor to produce a better
sequence.
When a user specifies an out-of-range index for vec_insert, we
just produce IR that has undefined behaviour even though the
documentation states that modulo arithmetic is used. This patch
just truncates the value to a valid index.
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.
Differential revision: https://reviews.llvm.org/D107899
This patch adds range checking for some Power10 altivec builtins and
changes the signature of a builtin to match documentation. For `vec_cntm`,
range checking is done via SemaChecking. For `vec_splati_ins`, the second
argument is masked to extract the 0th bit so that we always receive either a `0`
or a `1`.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D109710